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Dissertation  directed  by:  Professor  Joseph  B.  Bernstein 
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The  aerospace  industry  is  concerned  that  as  semiconductor  feature  sizes  are  reduced 
future  technology  generations,  device  lifetime  will  decrease  as  well.  Inherent  device 
failure  mechanisms,  such  as  electromigration,  hot  carrier  effects  and  time  dependent 
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Chapter  1 
Introduction 


The  aerospace  industry,  including  military,  space  and  commercial  users,  has  for  the 
past  several  years  faced  a  growing  challenge  in  the  use  of  commercial-off-the-shelf 
(COTS)  semiconductor  (Integrated  Circuit  (IC))  devices.  This  challenge  has  come 
from  the  electronic  industry’s  concentration  on  the  computer,  networking, 
telecommunications  and  consumer  markets  and  their  resulting  neglect  of  the  aerospace 
market  [1],  Continued  developments  within  the  semiconductor  industry  are  further 
jeopardizing  the  ability  of  the  aerospace  industry  to  use  future  devices  as  they  have 
used  COTS  devices  in  the  past.  In  particular,  shrinking  device  features  pose  the 
possibility  of  early  wearout  resulting  in  semiconductor  devices  which  have  a  shorter 
expected  lifetime  then  the  systems  in  which  they  are  incorporated. 

The  maturing  of  the  semiconductor  market  over  the  last  decades  has  shifted  the 
emphasis  of  the  industry  away  from  the  defense  and  aerospace  markets  to  the 
commercial  and  consumer  markets.  Market  pressures  are  driving  a  continued  quest  for 
more  speed,  larger  memories  and  lower  costs.  This  has  been  achieved  by  pushing 
device  feature  sizes  down  to  less  than  0.1  micron  (nanometer  scale).  These  advances 
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are  widening  the  division  between  the  requirements  of  the  aerospace  industry — harsher 
environments  and  longer  expected  lifetimes — and  the  needs  of  the  consumer  market. 

To  gain  increased  performace  along  with  lower  cost,  device  manufactures  have  reduced 
or  eliminated  the  reliability  margins  that  allowed  the  devices  to  be  used  in  aerospace 
applications.  The  expected  lifetime  of  such  devices  in  aerospace  applications  are  being 
reduced  from  decades  to  years  [1].  Increasing  the  problem  for  aerospace,  the  details  of 
a  manufacturer’s  processes,  products  and  data  are  proprietary  and  are  only  shared  with 
significant  customers  in  target  markets.  This  does  not  allow  aerospace  Original 
Equipment  Manufactures  (OEMS)  to  understand  part  capabilities  or  the  risks  of  using 
the  advanced  technologies.  Further  obstacles  to  using  COTS  devices  include  the 
constant  “improvement”  of  device  designs,  fabrication  and  assembly  processes  and  test 
methods.  These  obstacles  to  using  COTS  devices  maks  it  a  challenge  to  take  advantage 
of  the  advances  in  semiconductor  technology  and  incorporate  it  into  military  and 
commercial  avionics  systems. 

The  answer  of  how  to  best  incorporate  COTS  semiconductor  devices  into  aerospace 
systems  is  a  large  question  with  technological,  business  and  regulatory  aspects.  Today, 
several  aerospace  companies  are  working  together  with  government,  higher  eductation, 
and  semiconductor  device  suppliers  to  develop  an  Integrated  Aerospace  Parts 
Acquisition  Strategy  (IAPAS)  [1].  This  work  supports  the  IAPAS  effort  and  examines  a 
small  subset — early  device  wearout — of  the  overall  problem.  The  next  section  of  this 
introduction  provides  background  information  on  the  technological  trends  in  the 
semiconductor  industry,  the  business  climate  within  the  aerospace  and  how  this  effects 
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the  aerospace  industry.  Following  this  is  the  problem  statement  along  with  an  overview 


of  the  dissertation. 


1.1  Background 

1.1.1  Technological  Trends 

Technology  trends  are  driven  by  a  relentless  quest  for  more  processing  power. 
Customers  have  come  to  expect  a  constant  increase  in  power  and  functionality  from  IC 
devices  while  the  cost  remains  constant  or  drops.  To  stay  competitive  in  this  market  all 
device  manufactures  must  constantly  improve  their  products.  In  the  past,  the  rate  at 
which  semiconductor  products  improved  has  followed  Moore’s  law. 

In  1965,  Gorden  Moore  of  Intel  observed  that  the  number  of  transistors  in  an  IC 
grew  at  an  exponential  rate  over  time  [2],  His  “law”  states  that  the  number  of 
transistors  on  a  semiconductor  device  will  double  every  18  months.  He  expected  this 
growth  rate  to  continue  into  the  future  and  history  has  proven  him  to  be  correct.  Even 
today,  the  growth  rate  of  transistors  in  future  devices  is  expected  to  continue  following 
Moore’s  Law  [3]. 

The  leading  source  in  documenting  how  the  semiconductor  industry  will  meet  this 
challenge  is  the  International  Technology  Roadmap  for  Semiconductors  (ITRS). 
Published  semiannually,  this  report  is  a  cooperative  effort  of  the  world’s  top  five  region 
semiconductor  associations,  each  association  being  comprised  of  the  leading 
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Figure  1.1:  Intel  Processor  Growth.  The  growth  in  transistor  count  for  Intel 
processors  followed  Moore’s  Law  [3], 


semiconductor  device  manufactures.  It’s  purpose  is  to  detail  the  technological 
directions  of  the  semiconductor  industry  with  the  goal  of  continuing  to  meet  the  pace 
defind  by  Moore’s  law  for  the  next  fifteen  years.  The  focus  of  the  roadmap  has  been  on 
tradition  Complementary  Metal-Oxide-Silicon  (CMOS)  circuits  and  highlights  the 
technologies,  trends  and  roadblocks  of  the  technological  advance. 

The  technological  trends  in  the  semiconductor  industry  are  of  critical  importance  in 
understanding  the  impact  of  early  device  wearout.  As  the  number  of  transistors  has 
grown,  the  typical  die  size  of  an  IC  has  remained  constant.  This  means  the  size  of  the 
device  features  on  a  IC  die  have  shrunk  dramatically  in  order  to  allow  for  more 
transistors  to  be  placed  in  a  device.  These  smaller  device  features,  approaching  a  few 
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atoms  in  size,  are  causing  the  concern  about  lifetime.  In  the  past,  there  has  been 
adequate  margin  so  that  wearout  was  never  an  issue.  But  this  margin  may  not  be  there 
in  future  devices,  so  it  important  to  understand  where  semiconductor  technology  is 
headed  in  the  future. 

The  principle  method  the  semiconductor  industry  has  used  to  meet  the  challenge  of 
Moore’s  law  has  been  the  constant  decrease  in  the  minimum  feature  size  of  ICs  [4]. 
This  decrease  in  feature  size  is  termed  ‘scaling’ .  While  scaling  has  worked  in  the  past, 
CMOS  devices  are  now  reaching  the  point  (9  nm  feature  sizes  by  2016)  where  it  may 
be  difficult  to  scale  them  further  in  the  future.  And  it  is  at  these  small  scales  that 
problems  of  early  wearout  may  appear. 

1.1.2  Scaling 

Scaling  of  semiconductors  traditionally  happens  in  a  discreet  fashion  as  manufactures 
move  from  one  technology  ‘node’  to  another.  A  technology  node  is  defined  as  the 
half-pitch  of  the  smallest  device  feature  printed.  Half-pitch  is  defined  as  the  spacing 
between  device  features  (see  Fig.  1.2).  For  DRAM  (Dynamic  Random  Access 
Memory),  the  node  is  defined  by  half-pitch  of  the  first-level  metalization  interconnect 
lines.  For  logic  devices,  such  as  microprocessors  (MPU),  the  logic  interconnect 
half-pitch  refers  to  the  first  poly  silicon  or  metalization  layer.  The  half-pitch  of  MPUs 
and  ASICs  (Application  Specific  Integrated  Circuits)  lags  behind  the  half-pitch 
employed  for  DRAM.  Each  node  typically  represents  a  decease  in  half-pitch  of 
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Figure  1.2:  Half-Pitch.  The  pitch  of  a  typical  DRAM  and  Microprocessor 


(MPU)/ASIC.  Half-pitch  =  (Pitch/2). 


approximately  70%  from  the  previous  node  or  50%  from  two  technological  levels 
back.  This  is  illustrated  in  the  Fig.  1.3. 

Half-Pitch  (nm) 

Node  N 


Figure  1.3:  Half-Pitch  Nodes.  A  schematic  of  the  shrinkage  of  the  half-pitch 
at  processive  technology  nodes.  With  vertical  axis  represents  the  half-pitch  of 
each  technology  node  while  the  horizontal  axis  represents  time. 


According  to  The  ITRS  2001  report  [4],  the  rate  of  transition  between  technology 
nodes  as  accelerated  from  the  traditional  three  years  to  two  years  between  nodes  for 
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microprocessors  and  ASICs.  The  rate  is  expected  to  continue  until  at  least  2004  when  a 
90  nm  half-pitch  is  reached.  For  DRAM,  the  time  between  nodes  remains  three  years. 

With  technology  nodes  defined  by  half-pitch,  the  dimensions  and  parameters  of 
other  device  specifications  may  be  determined  by  applying  scaling  rules.  One 
commonly  used  set  of  rules  is  that  for  constant  field  scaling.  The  principle  of  this  rule 
is  that  device  voltages  and  dimensions  are  scaled  by  the  same  factor  ( K )  such  that  the 
electric  field  (S)  remains  constant.  Scaled  parameters  are  defined  in  Table  1.1.2.  A 
diagram  illustrating  the  use  of  scaling  factors  is  shown  in  Fig.  1.4. 

Table  1.1:  Scaling  Factors.  Scaling  parameters  for  MOSFET  device  parameters  [5] 


Device  Parameters 

Multiplicative 
Factor  (k  >  1) 

Scaling  Assumptions 

Device  dimensions  ( TOX:L,W ) 

l/K 

Voltage  (V) 

1/K 

Derived  device  parameters 

Electric  Field  (S) 

1 

Depletion-layer  width  (Wj) 

l/K 

Capacitance  (C  =  SA/t ) 

l/K 

Inversion-layer  charge  density  (Qj) 

1 

Current,drift  (/) 

l/K 

Channel  resistance  (Rch) 

1 
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Figure  1 .4:  Scaling  Factor.  Application  of  the  scaling  factor  to  create  a  scaled 
device  [5]. 
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1. 1.2. 1  ITRS  Roadmap  Trends 


The  2001  ITRS  roadmap  provides  a  more  detailed  set  of  predictions  on  scaling  trends 
than  can  be  derived  using  the  scaling  factors.  As  an  example,  the  half-pitch  trends  for 
high  power  microprocessor  devices  is  shown  in  Figure  1.5. 


Half-Pitch  (nm) 


2001  2004  2007  2010  2013  2016 


Technology  Node  Year 


Figure  1.5:  Node  Size  by  Year.  ITRS  predicted  node  sizes  for  high  power 
microprocessors. 


1. 1.2.2  Reliability 

The  ITRS  report  didn’t  provide  a  great  level  of  information  on  reliability  tends.  It  did 
however,  state  that  each  new  technology  requires  new  materials  and  techniques.  This 
will  introduce  new  failure  regimes  and  defects.  It  leaves  reliability  as  an  area  of 
concern  which  places  challenges  on  testing  and  wafer  level  reliability  (WLR). 
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Difficulties  listed  include  the  reliability  of  very  thin  oxy-nitride  gate  dielectrics  due  to 
high  gate  leakage,  new  gate  electrode  materials,  non-classical  CMOS  structures  and  the 
reliability  of  high  K  gate  dielectrics. 

1.1.3  Aerospace  Industry  Business  Climate 

From  a  technological  point  of  view,  there  is  no  problem  with  using  older  qualified 
designs  and  technology  in  aerospace  systems.  The  problem  in  the  aerospace  industry 
stems  from  the  business  environment.  Aerospace  companies  are  currently  faced  with  a 
swiftly  diminishing  set  of  manufacturing  sources  [1]  as  the  electronics  industry  focuses 
its  efforts  on  computer,  networking,  telecommunications  and  consumer  products.  The 
military/aerospace  markets  have  shrunk  as  a  percentage  of  the  overall  semiconductor 
market  and  do  not  carry  the  clout  they  once  enjoyed.  In  addition,  the  Secretary  of 
Defense  Perry  Acquisition  Reform  Memorandum  (Perry  Memo)  [6]  of  1994  strongly 
encouraged  the  military  services  and  defense  contractors  to  eliminate  the  use  of 
military  specifications  and  standards. 

By  eliminating  the  need  for  new  mil- spec  components  for  military  applications,  the 
Perry  memo  resulted  in  a  decline  in  the  availability  of  these  components  for  the  entire 
aerospace  industry.  The  commercial  side  of  the  aerospace  market  was  the  first  to 
transition  to  COTS  parts.  Military  products  are  transitioning  at  a  slower  pace  [1]. 
Initially,  the  fear  was  these  parts  would  not  be  suitable  for  harsh  aerospace 
environments.  However,  it  was  shown  that  improvements  in  COTS  parts  had  made 
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them  acceptable  for  aerospace  use  in  all  but  the  most  demanding  applications  (such  as 
radiation  sensitive  space  applications). 

While  COTS  semiconductors  have  proven  themselves  in  service,  there  is  no 
guarantee  they  will  remain  up  to  the  challenge  of  military /aerospace  applications  in  the 
future.  To  compete  in  their  core  markets,  semiconductor  device  manufactures  must 
constantly  improve  the  price-performace  ratio  of  their  products.  This  is  accomplished 
by  scaling  device  features  down  and  increasing  the  transistor  count  in  accordance  with 
Moore’s  law.  As  device  feature  dimensions  (half-pitch,  line  widths  and  gate  lengths) 
shrink  to  less  than  0.1  microns,  there  is  potential  for  impacts  on  aerospace  users.  These 
impacts  include  [1] 

•  Service  lives  of  3-10  years  for  aerospace  applications. 

•  Possible  non-constant  failure  rates. 

•  Increased  susceptibility  to  atmospheric  radiation. 

•  Changes  in  configuration  due  to  constant  “improvements”  to  device  design  and 
manufacture. 

•  The  inability  of  aerospace  users  to  understand  the  impact  of  nanometer 
technology  used  in  aerospace  applications. 

If  the  aerospace  industries  concerns  are  confirmed,  decreased  IC  reliability  will 
have  a  direct  impact  on  the  supportability  of  aerospace  systems.  Shorter  component 
lifetimes,  or  early  wearout,  would  drive  increased  maintenance  costs  as  failed 
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components  and  Line  Replaceable  Units  (LRU)  must  be  replaced.  For  example,  one 
estimate,  assuming  a  six  year  lifetime  on  semiconductor  devices,  shows  that  if  LRUs 
had  to  be  replaced  when  the  component  devices  failed,  the  added  support  cost  for  a 
commercial  aircraft  could  reach  one  million  dollars  a  year1 .  Additional  costs  would 
include  the  need  for  redesign  and  certification  since  components  in  production  when 
the  original  boxes  were  built  are  unlikely  to  remain  in  production  over  the  lifespan  of 
the  aircraft.  As  the  LRUs  need  to  be  replaced,  the  replacement  boxes  will  have  to  use 
the  newer  technology  and  devices  available  at  that  time. 

With  these  unknowns  and  concerns,  several  aerospace  companies,  such  as  Boeing 
and  Honeywell,  are  beginning  the  process  of  formulating  an  approach  to  dealing  with 
these  issues.  One  aspect  of  this  is  to  support  research  into  the  subject.  This  consists  of 
several  different  tracks.  First,  Boeing,  Honeywell  and  the  Defense  Standardization 
Program  Office  are  working  on  a  program  to  develop  an  Integrated  Aerospace  Parts 
Acquisition  Strategy  (IAPAS).  Part  of  this  program’s  efforts  include  colaberative 
research  with  the  Aerospace  Vehicle  Systems  Institute  (AVSI).  The  purpose  of  these 
activities  include  understanding  and  addressing  the  impacts  of  nanometer  technology. 
One  of  the  projects  is  AVSI  Project  #17  —  Methods  to  Account  for  Accelerated 
Semiconductor  Device  Wearout. 

'This  assumes  300  avionics  boxes  (LRU)  per  aircraft,  a  $20,000  cost  per  box  and  replacement  of  each 
box  every  six  years. 
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1.2  AVSI  Project  #17 


The  purpose  of  AVSI  project  #17  is  understand  the  impact  of  shrinking  device  features 
and  its  implications  on  device  lifetime  and  the  potential  for  device  wearout.  The 
premise  behind  the  project  is  that  future  avionics  systems  must  be  designed,  produced, 
operated,  maintained,  and  supported  using  COTS  components.  But  with  trends  in  the 
semiconductor  industry  moving  counter  to  aerospace  needs,  the  aerospace  industry 
cannot  assume  that  the  design,  production  or  service  life  of  individual  LRUs  will  be 
greater  than  5-10  years  [7].  Specifically,  the  three  areas  of  interest  for  this  project  are 
the  inherent  device  failure  mechanisms  of  electromigration,  hot  carrier  effects  and  time 
dependent  dielectric  (oxide)  breakdown  (TDDB). 

1.2.1  Work  Packages 

AVSI  Project  #17  has  a  total  of  eight  Work  Package/Milestones  (WP/MD).  The 
responsible  parties  include  the  AVSI  members  sponsoring  the  project  and  the 
University  of  Maryland.  The  project  was  approved  and  formally  started  Spring  2002. 
My  research  addressed  three  of  the  work  packages  these  were: 

1.  Determine  Likely  Failure  Mechanisms  of  Future  Semiconductor  Devices  in 
Avionics  Applications.  Conduct  a  literature  search  and  consult  with 
semiconductor  device  manufacturers  to  determine  likely  failure  mechanisms  of 
future  semiconductor  devices  in  avionics  applications.  Obtain  design  information 
regarding  expected  device  lifetimes  from  device  manufacturers.  The  deliverable 


13 


from  this  WP/MD  is  a  report  with  quantitative  information  regarding  the  above 
topics. 

2.  Develop  Models  to  Estimate  Expected  Lifetimes  of  Future  Avionics.  Based  on 
published  information  and  roadmap  information  regarding  future  avionics 
designs,  develop  mathematical  models  to  describe  time-to-failure  of  future 
semiconductor  devices  in  aerospace  applications.  This  will  involve  making  some 
assumptions  and  customizing  the  models  to  fit  aerospace  conditions.  The 
deliverable  is  a  report  with  equations  and  sample  calculations  to  estimate 
time-to-failure  with  respect  to  the  failure  mechanisms  identified  in  WP/MD  #1. 

3.  Develop  Device  Assessment  Methods  and  Avionics  System  Design  Guidelines. 
Using  the  information  developed  in  the  previous  WP/MD,  develop  guidelines 
and,  if  necessary,  suggest  test  methods  to  evaluate  the  potential  lifetimes  of 
specific  semiconductor  devices  in  existing  and  future  avionics  systems.  Also, 
develop  design  guidelines  for  future  avionics  systems  to  minimize  effects  of 
early  device  wearout. 

1.2.2  Project  Scope 

While  the  WP/MS  define  the  goals  and  direction  of  the  research,  they  didn’t  explicitly 
define  the  scope  of  the  work.  AVSI  Project  #17  is  only  a  part  of  the  Integrated 
Aerospace  Parts  Acquisition  Strategy.  The  scope  of  this  research  is  on  understanding 
the  inherent  failure  mechanisms  of  semiconductor  devices  that  could  lead  to  early 
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wearout  or  a  reduction  in  life  time.  Additionally,  this  project  includes  developing  ideas 
and  methods  to  minimize  the  impact  of  any  wearout  potential  or  reduced  lifetime. 
Excluded  from  this  project  are  extrinsic  failure  mechanisms  such  as  radiation, 
packaging  and  electrostatic  shock. 

1.3  Report  Overview 

After  this  introduction,  the  next  chapter  (Ch.  2)  discusses  the  systems  engineering 
methodology  used  on  this  project.  The  following  two  chapters  will  detail  the  research. 
The  first  (Ch.  3)  serves  as  a  tutorial,  discussing  the  failure  mechanisms  of 
semiconductors  and  how  scaling  will  effect  these  mechanisms.  The  next  chapter 
(Ch.  4)  discusses  derating  semiconductor  devices  to  increase  their  lifetime  and 
reliability.  The  last  chapter  (Ch.  5)  summarizes  the  results  to  date  and  discusses  future 
work  necessary  to  continue  supporting  this  project. 
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Chapter  2 

Methodology:  A  Systems  Engineering  Process 


Upon  an  initial  look,  the  problem  of  AVSI  Project  #17,  understanding  the  impact  of 
shrinking  semiconductor  device  features  and  its  impact  on  device  lifetime,  appears  to 
be  a  reliability  problem.  If  the  root  causes  of  the  failure  mechanisms  and  their 
relationships  with  shrinking  device  features  can  be  understood,  then  design,  process  or 
manufacturing  changes  can  be  made  to  alleviate  the  problem.  However,  in  actuality 
AVSI  Project  #17  is  much  more  than  just  a  technological  problem.  It  is  a  system  level 
problem  with  technological,  engineering,  business  and  market  aspects  and  it  involves  a 
wide  range  of  actors. 

Because  of  this,  it  was  approriate  to  incorporate  some  Systems  Engineering 
methodologies  into  the  research  process.  This  chapter  reviews  several  Systems 
Engineering  methodologies.  Next  the  chapter  explains  why  it  is  important  to 
incorporate  Systems  Engineering  methodologies  into  the  project  and  how  they  are 
implemented. 
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2. 1  What  is  Systems  Engineering? 


An  simple,  agreed  upon  explaination  what  Systems  Engineering  is  doesn’t  exist.  The 
best  place  to  start  in  understanding  Systems  Engineering  is  with  the  definition  given  by 
The  International  Council  on  Systems  Engineering  (INCOSE)  [8], 

“Systems  Engineering  is  an  interdisciplinary  approach  and  means  to 
enable  the  realization  of  successful  systems.  It  focuses  on  defining 
customer  needs  and  required  functionality  early  in  the  development  cycle, 
documenting  requirements,  then  proceeding  with  design  synthesis  and 
system  validation  while  considering  the  complete  problem:  Operations, 
Performance,  Test,  Manufacturing,  Cost  &  Schedule, Training  &  Support, 
Disposal.  Systems  Engineering  integrates  all  the  disciplines  and  specialty 
groups  into  a  team  effort  forming  a  structured  development  process  that 
proceeds  from  concept  to  production  to  operation.  Systems  Engineering 
considers  both  the  business  and  the  technical  needs  of  all  customers  with 
the  goal  of  providing  a  quality  product  that  meets  the  user  needs.” 

A  more  concise  definition  is  given  by  Austin  [9]:  Systems  Engineering  is  “the 
end-to-end  development — planning,  analysis  and  design,  implementation,  operation, 
retirement — of  complex  engineering  systems,  taking  into  account  engineering  and 
business  concerns”.  Both  these  definitions  mean  the  same  thing:  Systems  Engineering 
takes  into  account  all  parts  of  a  complex  system,  including  both  engineering  and 
non-engineering  aspects. 


17 


Decreasing  lifetime  of  semiconductor  devices  is  a  Systems  Engineering  problem 
because  aerospace  companies  do  not  have  the  purchasing  clout  to  strongly  influence 
the  design  of  advanced  ICs.  This  leaves  them  in  the  position  of  having  to  purchase 
those  devices  which  are  brought  to  the  market  and  intended  for  other  types  of 
applications.  If  this  were  solely  a  reliability  problem,  then  the  design  or  manufacturing 
process  of  the  IC  devices  would  be  modified  to  increase  their  lifetime,  alleviating  the 
problem.  Since  this  isn’t  an  practical  solution,  the  remainder  of  the  system  must  be 
altered  to  accommodate  the  reliability  shortfall. 

As  a  Systems  Engineering  problem,  all  aspects  of  an  aerospace  system,  including 
operations  and  maintenance,  are  subject  to  consideration.  Systems  Engineering 
depends  on  a  systematic  process  for  problem  solving.  Many  Systems  Engineering  tools 
and  methodologies  have  been  proposed.  The  exact  tool  used  depends  on  the  nature  of 
the  task. 


2.2  Systems  Engineering  Methodologies 

One  of  the  earliest  System  Engineering  methodologies  was  proposed  by  Hall  [10].  This 
methodology  serves  as  the  basics  of  many  later  ideas  on  applying  Systems  Engineering 
principles.  Hall’s  method  is  an  iterative  model  with  each  iteration  divided  into  seven 
steps.  Each  iteration  serves  to  improve  the  definition  of  the  system  concept  and  design. 
Hall’s  seven  steps  are: 

•  Problem  Definition:  Explicitly  define  the  problem  at  hand  along  with  constraints 
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and  the  scope. 


•  Value  System  Design:  Define  a  system  to  quantitatively  or  qualitatively  score 
alternative  system  designs  and  concepts  against. 

•  Systems  Synthesis:  Create  a  set  of  system  designs  and  concepts. 

•  System  Analysis:  Define  the  systems. 

•  Modeling  and  Optimization:  Model  and  refine  the  system  concepts. 

•  Decision  Making:  Select  an  alternative,  or  set  of  alternatives,  to  bring  into  the 
next  iteration. 

•  Planning  for  Action:  Plan  the  course  of  action  for  the  next  iteration  or  for 
implementation. 

These  steps  are  not  firm  fixed  rules  and  many  authors  have  proposed  additions, 
modifications  and  clarifications  to  this  basic  model.  One  particular  difficulty  with  Halls 
model  is  that  the  three  steps,  Systems  Synthesis,  System  Analysis  and  Modeling  and 
Optimization,  are  not  distinct.  They  tend  to  be  parts  of  the  same  whole.  Other  authors 
have  refined  Hall’s  basic  model.  In  particular,  I  examined  the  ‘Model-Based  Method’ 
[11]  derived  from  Hall’s  model  with  improvements  taken  from  papers  by  Sage,  Hill 
and  Warfield,  and  Mosard  [12,  13,  14,  15]. 

The  ‘Model-Based  Method’  consists  of  five  steps.  These  are: 

•  Problem  Definition 
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•  Model  Definition 


•  Modeling  and  Analysis 

•  Decision  Making 

•  Implementation 

The  last  methodology  I  examined  was  developed  by  the  Institute  for  Systems 
Research  (ISR)  at  the  University  of  Maryland  [9].  This  methodology  is  a  visual 
modeling  language,  based  on  Unified  Markup  Language  (UML)  diagram  notation,  for 
systems  architecting  and  engineering  design.  While  it  doesn’t  appear  to  be  directly 
descended  from  Hall’s,  the  ISR  methodology  does  share  many  of  the  same  inherent 
features  as  the  other  methodologies  I  reviewed,  just  with  different  ways  of 
accomplishing  them.  Two  aspects  of  the  ISR  approach  that  I  favored  as  an 
improvement  were  its  emphasis  on  traceability  and  on  the  use  of  visual  UML  modeling. 

Traceability  refers  the  process  of  ensuring  each  function  and  feature  can  be  traced 
back  to  the  original  requirement,  either  directly  or  as  a  derived  requirement.  Tracability 
works  two  ways.  First,  it  is  used  to  ensure  that  all  the  requirements  are  implement  in 
the  design  solution.  Secondly,  it  ensures  that  the  design  solution  doesn’t  include 
extraneous  features  unnecessary  to  meet  the  requirements.  At  this  point  in  AVSI 
Project  #17,  traceability  isn’t  a  critical  tool,  but  as  the  development  of  a  solution 
progresses  it  will  become  increasingly  important. 

The  second  feature  of  ISR’s  methodology  I  liked  was  the  use  of  UML  modeling. 
UML  is  an  object-oriented  visual  modeling  language  typically  used  in  software 
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development.  The  language  is  extensible  allowing  it  to  be  adapted  to  modeling 
physical  as  well  as  logical  systems. 

2.2.1  Project  Methodology 

In  examining  the  system  methodology  models  I  found  them  to  be  focused  on  design 
solutions  to  a  problem.  But  for  this  project  I  needed  a  higher  level  of  abstraction  for  the 
starting  point.  The  first  phase  of  this  project  is  to  understand  the  nature  and 
implications  of  the  problem,  not  to  necessarily  design  a  fix.  So  the  methodology 
required  would  have  to  accommodate  this  search  of  the  problem  space. 

The  methodologies  drawn  on  so  far  are  iterative.  They  are  best  suited  for 
developing  and  examining  alternative  solutions  to  a  problem  and  then  refining  in  them 
until  a  solution  is  reached.  Each  iteration  serves  to  narrow  the  solution  space  (number 
of  alternatives)  and  to  increase  the  level  of  detail  of  the  remaining  solutions.  At  the 
conclusion  of  the  process  and  solution  is  implemented. 

These  methodologies  were  not  directly  suited  to  the  AVSI  Project  #17  project  needs 
at  this  point.  This  project  is  a  research  project  with  the  true  problem  and  needs 
unknown.  This  project  was  not  well  defined,  as  in  a  purpose  the  project  was  to  define 
itself.  A  spiral  development  model  is  best  suited  for  this  since  it  allows  knowledge 
gained  to  be  brought  back  into  the  research  cycle.  The  purpose  of  each  spiral  is  to 
refine  the  understanding  of  the  problem  and  to  propose  solutions  and/or  directions  for 
continued  work.  The  iterative  methods  provided  structure  for  the  spirals.  I  based  each 
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spiral  on  the  ‘Model-Based  Method’,  but  consolodated  the  Decision  Making  and 
Implementation  steps. 

As  in  Hall’s  method,  the  first  step  of  the  spiral  model  is  to  define  the  problem.  This 
involves  writing  out  the  statement  of  what  the  problem  is  and  it  defines  the  system 
boundaries  and  the  scope  of  the  problem. 

The  second  step  is  model  definition.  This  step  involves  building  a  model  of  the 
system.  This  step  is  where  UML  modeling  may  be  applied  to  good  advantage.  Another 
portion  of  this  step  involves  gathering  information  to  flesh  out  the  model. 

The  third  step.  Modeling  and  Analysis,  focuses  on  taking  all  the  model  inputs  and 
understanding  their  impacts  on  the  system. 

The  forth  and  last  step  is  Decision  Making.  Here  the  results  of  the  previous  steps 
are  used  to  draw  conclusions  and  make  decisions  on  the  future  direction  of  the 
research.  The  knowledge  gained  in  these  four  steps  in  fed  back  into  the  spiral 
development  pattern  and  it  continues  again. 

With  this  model,  it  is  possible  to  call  a  halt  to  the  spiral  if  conditions  warrant.  An 
example  of  this  would  be  when  the  state  of  knowledge  in  the  system  is  such  that  a 
suitable  level  of  detail  has  been  reached  in  the  problem  defintion  and  understanding 
that  a  solution  can  be  developed.  At  this  point  the  methodology  would  continue  to  the 
implementation  step  and  then  enter  a  pattern  of  design  iterations  to  refine  the  solution. 
It  is  also  possible  for  design  solutions  to  be  broken  out  of  the  spiral  model  to  be 
developed  on  their  own  in  a  seperate  iterative  process  as  the  main  spiral  explores  other 
alternatives.  The  next  two  subsections  describe  of  how  a  Systems  Engineering 
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methodology  was  implemented  in  this  research. 


2.2.2  First  Spiral:  Device  Physics 

The  first  action  in  this  spiral  is  to  define  the  problem.  The  overarching  problem 
definition  was  provided  by  AVSI  Project  #17  and  states: 

“This  project  will  develop  methods  to  evaluate  the  mechanisms  and 
accommodate  the  effects  of  accelerated  semiconductor  device  wear  out  on 
avionics  system  design,  production,  and  support;  and  develop  methods  to 
account  for  shorter  device  lifetimes  in  avionics  system  safety  and 
reliability  analysis”  [16]. 

Model  definition  begins  with  a  use  case  model.  The  initial  use  case  model,  shown  in 
Figure  2.1,  provides  a  picture  of  the  actors  involved  with  the  system.  Its  purpose  is  to 
highlight  the  different  actors  having  an  effect  (extends)  on  the  lifetime  of 
semiconductor  devices,  as  well  as  the  actors  being  impacted  (uses)  by  the  device 
lifetime.  Highlighting  the  different  actors  involved  with  the  system  demonstrates  how 
this  is  a  Systems  Engineering  problem  and  the  entire  scope  of  the  problem  must  be 
considered  when  understanding  the  probelm  and  synthesizing  solutions. 

The  next  step  in  Model  Definition  is  researching  and  understanding  failure  in 
semiconductor  devices.  This  includes  determining  the  suitable  lifetime  models  for  the 
three  wearout  failure  mechanisms  (electromigration,  hot  carrier  effects  and  TDDB). 
The  Modeling  and  Analysis  step  involves  understanding  how  the  mechanism  lifetime 
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Figure  2.1:  Use  Case  Model. 
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models  react  to  variations  in  their  input  parameters  as  well  as  the  effect  of  device 
scaling  on  the  lifetime  for  these  mechanisms.  The  Decision  Making  and 
Implementation  steps  involve  moving  to  the  next  spiral,  derating  the  semiconductor 
device  for  increased  lifetime.  The  results  of  this  step  are  presented  in  Chapter  3. 

2.2.3  Second  Spiral:  Derating 

The  second  spiral  focuses  on  exploring  the  concept  to  alleviate  the  impact  of  shrinking 
device  lifetime,  derating  the  semiconductor  devices  for  aerospace  use.  The  problem 
definition  for  this  spiral  is: 

Model  the  change  in  semiconductor  device  lifetime,  for  a  device 
operated  a  derated  conditions,  from  electromigration,  hot  carrier  and 
TDD B  failure  mechanisms.  The  models  shall  be  usable  by  AVSI  Project 
#17  members,  with  data  available  to  them  (either  from  the  device 
manufactures  or  via  accelerated  life  testing)  for  the  purpose  of  estimating 
lifetime  improvement  from  device  derating. 

This  Model  Definition  step  draws  on  the  results  of  the  first  spiral  to  define  a 
dearating  factor  for  each  of  the  failure  mechanisms.  The  Modeling  and  Analysis  step 
involves  verification  of  a  constant  failure  rate  assumption  for  each  of  the  mechanisms 
and  an  analysis  of  the  derating  models  response  to  changes  in  the  input  variables.  The 
results  of  this  step  is  presented  in  Chapter  4.  The  results  of  the  Decision  Making  and 
Implementation  steps  are  covered  in  Section  5.2,  Future  Work. 
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2.3  Summary 


This  chapter  has  been  a  brief  overview  of  the  methodology  used  on  this  project.  At  this 
early  stage  in  the  AVSI  Project  #17,  the  most  significant  contribution  of  the  Systems 
Engineering  approach  was  to  consider  the  impact  of  non-technical  aspects  on  the 
semiconductor  device  lifetime  and  how  that  will  effect  solutions  to  the  problem.  Future 
work,  such  as  developing  specific  design  solutions  using  derated  devices,  will  require  a 
more  rigorous  iterative  Systems  Engineering  process. 
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Chapter  3 

Background:  Impact  of  Scaling 


This  chapter  examines  the  effect  of  device  scaling  on  the  inherent  reliability  of 
semiconductor  devices.  It  begins  with  basic  concepts  of  failure  and  the  classification  of 
failure.  Next,  the  mechanisms  of  electromigration,  hot  carrier  degradation  and  oxide 
breakdown  are  explained  and  models  are  presented  for  predicting  mechanism 
reliability.  At  the  end  of  this  chapter,  the  impact  of  technology  node  scaling  on  lifetime 
is  discussed. 


3.1  Understanding  Failure 

3.1.1  What  is  Failure,  Degradation  and  Wear-out 

A  semiconductor  device  has  failed  when  response  parameters  from  the  device  (e.g. 
voltage,  capacitance,  resistance,  gain,  etc.)  no  longer  meet  the  design  parameters.  A 
simpler  way  of  stating  this  is  the  device  has  failed  when  it  is  in  a  physical  state  or 
condition  in  which  it  can  no  longer  perform  its  intended  function.  The  failure  may  have 
been  caused  by  a  sudden  internal  or  external  event  triggering  a  physical  change  in  the 
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device  or  the  failure  may  have  developed  slowly  as  physical  changes  over  time  alter  the 
response  of  the  device.  The  latter  is  referred  to  as  degradation.  Wear-out  occurs  when 
degradation  reaches  the  point  where  the  device  is  considered  to  have  failed. 

A  failure  may  be  classified  into  one  of  three  categories,  intrinsic,  extrinsic,  or 
electrical  stress  (in-circuit)  failures.  Of  these,  the  areas  of  greatest  intrest  in  this  study 
are  the  intrinsic  failure  mechanisms. 

Intrinsic  failures  are  the  result  of  failure  mechanisms  originating  with  the 
semiconductor  device,  or  die,  and  the  processing  during  the  ‘front  end’  of 
manufacturing.  Examples  of  these  mechanisms  include  design  errors,  lithography  and 
processing  defects,  contamination  or  the  limitations  of  material  properties.  These 
defects  may  result  in  a  device  being  fatally  defective  so  it  never  functions  or  these 
defects  may  be  small  enough  so  they  are  non-lethal.  However,  stresses  from 
temperature,  voltage  and  current  flow,  along  with  humidity  and  radiation,  may  cause 
these  non-lethal  defects  to  grow  into  a  lethal  defect  resulting  in  a  failure. 

Extrinsic  failures  are  identified  with  the  interconnection  and  packaging  of  chips  in 
the  ‘back  end’  of  manufacturing.  These  types  of  failures  are  external  to  the  device 
circuitry  itself.  Extrinsic  failures  are  not  the  subject  of  the  this  research  since  they  are 
not  directly  related  to  the  device  itself  or  shrinking  device  features. 

Electrical  stress  failures  are  generally  caused  by  discrete  events  and  are  often 
considered  to  be  ‘random’  failures.  These  damaging  events  typically  occur  during 
handling  and  they  include  Electrostatic  Discharge  (ESD)  and  Electrical  Overstress 
(EOS).  In  fact  EOS  and  ESD  can  make-up  over  50%  of  in  field  failures  [17].  EOS  is 
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caused  by  events  which  occur  during  normal  circuit  operation  and  lead  to  over-voltage 
and  over-current  stresses  of  long  duration — greater  then  1  ms,  although  some  stress 
events  as  low  as  a  few  /is  long  are  classified  as  EOS  (and  may  be  too  fast  for  protection 
schemes  to  prevent).  The  effect  of  EOS  is  typically  to  cause  a  hot  spot  to  develop  in  the 
IC.  As  it  gets  hotter,  more  current  flows  into  the  heated  region  and  temperatures 
continue  to  build.  When  the  temperature  approaches  the  melting  point  of  Si  (1688  K), 
failures  may  occur  as  short  circuits  form  injunctions  or  metallization  melts  creating 
open  circuits  [17].  ESD  is  caused  by  stress  extrinsic  to  the  normal  operation  of  a 
device.  An  example  is  static  charges  of  over  100  V.  If  an  IC  isn’t  protected,  these 
charges  may  damage  the  gate  oxides  in  Metal-on-Silicon  (MOS)  transistors.  Protection 
is  provided  in  nearly  all  circuits,  so  the  typical  IC  failure  mechanism  is  thermal  [18]. 

All  failure  types  may  occur  at  any  point  in  a  semiconductor  device’s  lifetime.  The 
‘Bathtub’  curve,  shown  in  Figure  3.1,  represents  the  instantaneous  failure  rate  at  any 
point  over  a  device’s  lifetime.  This  is  a  hazard  rate  and  is  measured  as  the  number  of 
failures  per  unit  time[19].  The  curve  is  divided  into  three  phases,  infant  mortality, 
useful  life  and  wearout. 

For  a  large  sample  of  devices,  the  infant  mortality  phase  represents  items  that  fail 
early  due  to  manufacturing  defects.  As  defective  items  quickly  fail  and  are  removed 
from  the  system,  the  failure  rate  drops.  Bum-in  is  often  used  to  screen  out  items 
inclined  to  fail  prematurely,  resulting  in  the  remaining  devices  having  a  longer 
expected  lifetime.  However,  burn-in  is  only  justified  so  long  as  the  failure  rate 
decreases  over  time  as  seen  in  the  infant  mortality  region  of  Figure  3.1. 
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Failure  Rate 


Figure  3.1:  Bathtub  curve.  The  line  represents  the  instantaneous  failure  rate 
of  a  device  showing  the  periods  of  infant  mortality,  useful  life  (random  failure) 
and  wearout. 

For  items  that  survive  the  infant  mortality  phase,  they  enter  a  long  period  of  a 
relatively  constant  failure  rate.  This  period  of  time  is  termed  the  useful  life  of  the 
product.  Failures  during  this  period  are  attributed  to  random  failures  resulting  from 
sources  such  as  random  external  events,  non-lethal  intrinsic  defects  that  have  grown 
into  lethal  defects,  or  early  wearout.  The  end  of  the  product’s  useful  life  comes  as  the 
failure  rate  begins  to  climb. 

An  increasing  failure  rate  represents  the  start  of  the  wearout  phase.  During  this 
phase  the  physical  degradation  suffered  throughout  the  working  life  of  the  item  begin 
to  increase  the  failure  rate. 
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3 . 1 . 1 . 1  Infant  Mortality  Failures 


Failures  during  the  infant  mortality  period  are  typically  the  result  of  defects  introduced 
during  the  manufacturing  process.  Infant  mortality  need  not  be  a  direct  concern  of  an 
IC  user — assuming  the  manufacturer  has  a  burn-in  procedure  to  screen  out  susceptible 
parts. 

The  manufacture  of  semiconductor  devices  will  always  result  in  products  with 
defects.  Some  of  these  defects  are  insignificant,  others  serve  as  a  nucleus  of  failure  in 
the  future,  while  still  others  are  significant  enough  to  render  the  device  useless  or  to 
cause  it  to  fail  quickly.  If  the  infant  mortality  failure  rate  distribution  of  a  product  is 
known,  infant  mortality  by  itself  should  not  directly  pose  a  reliability  problem  for 
devices  placed  in  service.  Screening,  through  burn-in,  weeds  out  the  devices  that  have 
significant  manufacturing  defects.  In  a  burn-in  process,  each  manufactured  device  is 
tested  at  normal  or  accelerated  stress  conditions.  The  length  of  the  burn-in  period  is  set 
to  correspond  to  the  length  of  the  infant  mortality  period.  Devices  that  survive  the 
burn-in  are  expected  to  be  susceptible  only  to  random  and  wearout  failures.  Through 
the  use  of  conditional  probability,  it  can  be  shown  that  for  a  product  with  a  decreasing 
failure  rate,  the  expected  life  of  surviving  product  after  burn-in  is  greater  than  if  there 
had  been  no  burn-in  [20] . 

Infant  mortality  is  important  to  understand  because  it  is  an  indicator  of  the  overall 
reliability  of  a  device  in  that  defects  that  don’t  cause  initial  failures  often  serve  as 
sources  of  failure  in  the  future.  Hence  a  product  experiencing  high  infant  mortality 
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usually  indicates  a  product  with  the  potential  for  a  higher  failure  rate  during  its  useful 
life.  Additionally,  high  infant  mortality  may  be  indicative  of  inadequately  controlled 
manufacturing  processes  or  design  parameters  that  are  not  optimal.  Likewise,  yield1  in 
IC  production  has  been  shown  to  relate  to  in-service  reliability  [18]. 

3.1.2  Yield 

During  the  manufacturing  process,  there  are  several  potentially  lethal  defects  that  may 
occur,  leading  to  reduced  yield  or  infant  mortality  failures.  Among  these  are  processing 
errors,  contamination,  material  flaws,  residual  stresses  and  contact  failures.  Yield  refers 
to  the  percentage  of  devices  manufactured  that  are  suitable  for  sale  to  a  customer. 

Contamination  of  wafers  during  fabrication  is  an  ever  present  source  of  defects  in 
semiconductors.  As  the  process  size  has  decreased,  the  size  of  particles  that  may  cause 
defects  have  shrunk.  Clean  rooms  have  reached  fantastic  levels  of  cleanliness,  but  now 
individual  molecules,  such  as  (b,  N2,  &  EDO,  are  sources  of  contamination.  Etching 
and  cleaning  liquids  and  wear  particles  from  hardware  contribute  to  the  contamination 
problem  as  well.  The  greatest  impact  of  contamination  is  to  yield,  but  contamination 
creates  non-lethal  defects  as  well.  Examples  of  contamination  sources  include  ion 
implantation,  metal  contamination  and  damage  to  lithography  masks. 

Fabrication  also  introduces  stress  into  the  ICs.  While  it  isn’t  a  defect  by  itself, 
stress  may  lead  to  the  formation  of  defects.  Stress  results  from  the  processing  of 

'in  semiconductor  fabrication,  yield  is  the  percentage  of  functional  devices  manufactured  versus  the 
total  number  manufactured. 
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dissimilar  materials  in  contact  with  each  other.  High  temperatures  are  used  during  the 
manufacturing  process  and  as  every  material  has  different  thermal  expansion 
properties,  they  expand  and  contract  differently  during  fabrication,  leading  to  the 
formation  of  internal  stresses.  Stress  may  also  be  introduced  during  grinding 
operations  to  thin  the  backside  of  wafers.  Defects  occur  when  excessive  tensile  stresses 
cause  cracks  to  form  in  films  or  when  compressive  stresses  cause  wrinkling  and  the 
loss  of  adhesion  between  a  film  and  substrate. 

Defects  are  not  only  created  during  the  fabrication  process,  but  they  may  also  reside 
in  the  material  as  well.  The  silicon  wafers  used  in  manufacturing  will  contain  intrinsic 
defects  [18]  such  as  lattice  vacancies,  dislocations  and  grain  boundaries.  These  defects 
may  have  a  catastrophic  effect  on  the  electrical  properties  of  device  features,  but  are 
generally  not  a  source  of  wearout,  rather,  intrinsic  defects  in  circuits  themselves. 

3. 1.2.1  Non-lethal  Defects 

Yield  is  a  good  indicator  of  the  presence  of  non-lethal  defects  in  a  device.  A 
consequence  of  non-lethal  defects  in  an  IC  is  that  they  result  in  some  device  features 
lower  in  strength  than  average.  Low  yield  products  typically  have  a  lower  reliability  in 
service  because  of  existence  of  an  increased  number  of  defects  [18].  Examples  of  these 
defects  ‘near  opens’  and  ‘near  shorts’  in  the  metallization  and  near  shorts  in  gate 
oxides  and  passivating  insulators.  Defects  on  a  wafer,  both  lethal  and  non-lethal,  are 
random  distributed  (baring  a  systematic  source  of  flaws).  The  size,  or  criticality,  of  the 
defect  will  be  random  based  on  the  location  of  the  contamination  or  flaw.  The  result  is 
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non-lethal  defects  will  grow  to  failure  after  various  periods  of  use,  appearing  to  a  user 


as  random  failures. 


3.2  Fundamental  Failure  Processes 

To  move  from  an  operating  to  a  failed  state,  the  physical  state  of  a  semiconductor  must 
change.  Without  moving  parts,  the  change  in  the  physical  change  of  a  semiconductor 
results  from  the  movement  of  electrons  within  it  along  with  the  associated  electric 
fields,  currents  and  temperatures.  Two  of  the  concepts  necessary  for  understanding 
these  physical  changes  are  the  Arrhenius  model  and  the  concept  of  diffusion. 

3.2.1  The  Arrhenius  Model 

The  Arrhenius  model  forms  the  basis  of  understanding  the  lifetime  prediction  for  many 
semiconductor  failure  mechanisms.  It  has  generally  been  accepted  that  the  best  way  to 
accelerate  failure  in  electronics  is  by  raising  the  temperature.  While  heat  by  itself  is  not 
a  mechanism  of  failure,  thermal  energy  does  contribute  to  the  acceleration  of  many 
failure  mechanisms.  Because  some  failure  mechanisms  are  a  thermally  activated 
process,  time  to  failure  is  often  modeled  through  the  use  of  a  temperature  dependent 
relationship.  The  standard  way  to  do  this  is  through  the  Arrhenius  model.  This  model 
was  originally  developed  in  1899  to  model  the  reaction  rate  of  chemical  constituents. 
Over  the  years  it  has  been  used  to  model  temperature  acceleration  factors  for  electronic 
component  failure.  The  basic  form  of  the  Arrhenius  model  for  predicting  MTTF  in 
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electronic  devices  is 


M7TF  =  Aexp(^  (3.1) 

where  A  is  the  frequency  factor,  Ea  is  the  activation  energy,  k  is  Boltzmann’s  constant 
(8.62  x  10  5  eV/K)  and  T  is  absolute  temperature  (K).  The  activation  energy  may  be 
described  as  an  energy  barrier  separating  the  reactants  from  products  in  a  chemical  or 
physical  process  connected  to  a  particular  failure  mechanism  [21].  Often  the  activation 
energy  of  different  failure  mechanisms  is  discussed  in  literature.  In  this  context, 
activation  refers  back  to  the  Arrhenius  equation.  The  higher  the  activation  energy,  the 
quicker  the  failure  will  occur  with  increased  temperature.  Most  failure  processes  have  a 
positive  activation  energy.  One  exception  is  hot  carrier  degradation  (see  section  3.4.2). 
However,  recently  it  has  been  found  that  oxide  related  degradation  is  not  accelerated  by 
this  model. 

3 . 2 . 1 . 1  Acceleration  Factor 

A  common  application  of  the  Arrhenius  model  is  in  the  development  of  Acceleration 
Factors  (Ay)  in  conjunction  with  accelerated  life  testing.  During  accelerated  life 
testing,  the  lifetime  of  a  device  is  experimentally  determined  at  a  high  stress  condition 
(such  as  increased  temperature,  voltage,  current,  etc.).  An  Acceleration  Factor  is  used 
to  extrapolate  those  results  to  the  stress  level  at  normal  operating  conditions  in  order  to 
predict  the  lifetime  of  the  device. 

The  definition  of  an  Acceleration  Factor  is  the  “ratio  of  the  measured  failure  rate  of 
semiconductor  devices  at  one  stress  condition  to  the  measured  failure  rate  of  identical 
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devices  stressed  at  another  condition”  [22].  The  Mean  Time  to  Failure  (MTTF)  of  a 


device  operating  under  accelerated  stress  conditions,  multiplied  by  Af,  provides  an 
estimate  of  the  device’s  MTTF  under  normal  use  conditions.  This  defines  A;  as 


A  =  MrTFUse 

1  MTTFsfrCiSiS 


(3.2) 


3.2.2  Diffusion 

Diffusion  is  the  process  whereby  particles  move  from  areas  of  a  higher  concentration  to 
areas  of  lower  concentration.  Diffusion  is  a  fundamental  failure  mechanism  in  that 
contaminant  atoms  may  change  the  electrical  characteristics  of  an  IC.  Semiconductor 
devices  function  because  the  silicon,  or  other  semiconductor  material,  is  doped  with 
other  elements  to  change  its  local  electrical  properties  in  order  to  build  gates  and  other 
electrical  components.  For  example,  doping  levels  determine  the  properties  of  n  and  p 
junctions. 

Diffusion  of  atoms  from,  or  into,  these  doped  regions  will  change  the 
characteristics.  When  the  characteristics  are  altered  too  much  a  failure  is  considered  to 
have  occurred.  Diffusion  plays  a  role  in  other  failure  mechanisms  as  well.  For 
example,  electromigration,  corrosion  and  nearly  all  thermally  activated  processes  are 
due  to  diffusion  mechanisms. 

The  rate  of  diffusion  is  dependant  on  three  factors:  temperature,  diffusivity  of  the 
migrating  atoms  and  concentration  of  the  migrating  atoms.  By  examining  the  physics 
of  diffusion  we  can  see  how  these  factors  apply.  An  explanation  of  the  physics  begins 
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with  Fick’s  law  [18], 


Jm  =  -D—  (3.3) 

ax 

Fick’s  law  defines  the  diffusion  process  where  the  flux  in  the  positive  x  direction  is 
given  by  Jm,  D  is  the  diffusion  coefficient,  and  dC/dx  is  the  concentration  gradient. 

The  diffusion  coefficient  is  dependant  on  a  number  of  factors.  These  include  the  nature 
of  the  diffusing  atoms,  the  matrix,  the  transport  method  (lattice,  grain  boundary, 
dislocation,  surface,  interstitial,  etc.),  temperature  and  the  concentration  of  the  diffusing 
species.  The  relation  of  D  to  temperature  is  given  by  an  Arrhenius  type  equation 

D  =  D0ex  p(^)  (3.4) 

with  R  being  the  gas  constant  and  Ed  the  activation  energy  for  diffusion. 

The  equation  for  non-steady  state  diffusion  in  one  dimension  is 

dC(x,t)  _  d2C{xd) 

dt  dx2  1 } 

when  D  is  constant.  This  equation  may  be  solved  using  standard  partial  differential 
equation  techniques.  The  first  step  is  to  define  the  initial  conditions  by  assuming  a 
semi-infinite  matrix  with  an  initial  concentration  of  C0  atoms.  More  atoms  enter  the 
matrix  at  x  —  0  from  a  concentration  of  Cs.  Next  we  assign  the  boundary  conditions: 
C(x,  0)  =  C0,  C(0,t)  =  Cs  and  C(°°,t)  =  Ca.  The  solution  is 

C(x,t)  -C0  _  f  x  \ 

C,  -  C„  rfC((4Dr)i/2) 

=  (3-6) 
n  '/z  Jo 
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If  an  instantaneous  surface  concentration  (5)  is  assumed  rather  then  a  constant  (Cs),  a 


simpler  solution  can  be  found.  This  is  a  gaussian  solution, 

c('v)  =  (^^exp(-4k)  <3-7) 

The  extent  of  diffusion  penetration  at  any  given  time  may  be  estimated  using 

x2  =  ADt  (3.8) 


3.3  Modeling  Failure 

The  failure  of  semiconductors  may  be  modeled  in  one  of  three  ways,  the  physics  of 
failure,  through  simulation  or  via  statistical  models.  All  three  methods  have  limits  and 
weaknesses. 

Predicting  the  failure  of  a  device  from  first  physical  principles  is  intellectually 
appealing.  If  you  could  accomplish  this,  you  would  have  a  complete  picture  of  how  a 
device  would  fail.  This  would  allow  you  to  precisely  trade-off  design  and  performance 
specifications  for  the  necessary  reliability.  However,  as  a  practical  matter,  a  quick 
review  of  the  literature  about  any  given  failure  mechanism  will  show  that  there  is 
usually  insufficient  knowledge  to  completely  model  the  physics.  This  uncertainty 
drives  the  failure  prediction  from  being  a  deterministic  problem  to  a  probalistic  one. 

Simulation,  either  discrete  event  or  continuous,  allows  probabalistic  effects  to  be 
incorporated  into  a  physics  of  failure  model.  Where  detailed  knowledge  of  the  physics 
is  unknown,  probalistic  models  of  the  effects  may  be  substituted.  Simulation  is 
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certainly  possible  for  device  manufacturers  who  have  detailed  knowledge  about  their 
devices.  This  detailed  knowledge  is  typically  proprietary  information,  meaning  other 
parties  will  have  less  information  to  work  with,  making  simulation  more  difficult. 
Another  weakness  of  simulations  is  the  need  to  fully  understand  and  model  all 
interactions.  This  can  easily  result  in  complex  models.  According  to  Huntington  [23], 
“Either  one  takes  all  the  variables  into  consideration,  then  the  problem  is  most  likely 
not  solvable,  or  one  restricts  the  consideration  to  a  manageable  degree,  in  which  case 
the  model  is  probably  not  accurate.” 

Without  detailed  knowledge  of  the  device  physics,  the  best  way  to  represent  device 
failures  is  through  probalistic  statistical  models.  These  models  use  observed 
relationships  between  failure  times  and  various  input  parameters  to  generate 
probabilistic  assessments  of  when  failure  may  occur. 

3.3.1  Reliability  Distributions 

The  probalistic  lifetime  of  semiconductor  devices  are  quantified  using  a  variety  of 
terms.  The  most  basic  of  these  is  the  failure  rate  (A)  defined  as  the  number  of  failures 
per  unit  time.  This  corresponds  to  the  failure,  or  hazard,  rate  shown  in  the  bathtub 
curve.  Assuming  a  constant  failure  rate  (as  seen  during  the  useful  life  phase),  the 
reciprocal  of  A  is  the  Mean  Time  Between  Failure  (MTBF  =  j).  Since  the  failure  rates 
are  not  always  constant,  a  more  generalized  term,  Mean  Time  to  Failure  (MTTF),  is 
also  used.  The  standard  method  of  quantifying  semiconductor  device  reliability  is  via 
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FIT  (Failure  in  Time)  rates.  FIT  rates  are  defined  as  the  number  of  failures  per  109 
device-hours,  e.g.  1  FIT  =  1  failure  in  109  device-hours  [24,  25]. 

The  reliability  of  a  semiconductor  is  only  partially  defined  by  its  lifetime.  To  know 
the  probability  that  a  semiconductor  device  is  functioning  at  a  given  point  in  time 
requires  a  failure  rate  distribution.  For  this  purpose,  two  of  the  most  popular 
distributions  used  are  the  Weibull  and  exponential.  The  Weibull  distribution  is  used 
because  of  its  ability  to  assume  a  wide  variety  of  shapes,  including  decreasing, 
increasing  and  constant  failure  rate  models.  For  a  constant  failure  rate,  the  Weibull 
distribution  reduces  to  the  exponential  model. 

The  exponential  model  has  many  useful  properties.  First,  it  is  memoryless, 
meaning  the  probability  of  failure  at  any  given  point  in  time  is  independent  of  how  long 
the  device  has  been  operating  previously.  This  makes  the  exponential  a  favored 
distribution  because  of  its  ease  of  calculation  since  it  uses  only  a  single  parameter  to 
describe  the  distribution.  Additionally,  the  exponential  distribution  is  often  a  good 
model  for  the  failure  of  systems  with  a  large  number  of  components.  Within  a  system, 
each  component’s  failure  occurs  randomly,  according  to  that  component’s  various 
failure  modes  and  expected  life  distribution.  Given  a  sufficiently  large  set  of 
components  and  various  failure  modes,  the  times  of  the  individual  random  component 
failures  average  out  to  a  constant  system  failure  rate  [26] . 
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3.4  Wear-out  Mechanisms 


Semiconductor  device  wearout  concerns  intrinsic  mechanisms,  electromigration,  oxide 
breakdown  (TDDB)  and  hot  carrier  effects  (HC)  effects  encompassing  hot  carrier 
degradation  (HCD)  and  negative  bias  temperature  instability  (NBTI). 

3.4.1  Electromigration 

As  electrons  pass  through  a  electrical  conductor  they  will  transfer  some  of  their 
momentum  to  the  conductor’s  atoms.  If  the  current  density  is  high  enough  some  of 
those  atoms  will  be  pushed  along  by  the  electron  flow,  depleting  material  at  the  cathode 
side  and  building  up  excess  material  at  the  anode.  This  diffusive  process  is  known  as 
electromigration  and  results  in  failure  though  damage  from  the  formation  of  open 
circuits,  increased  electrical  resistance  or  short  circuits. 

Failures  from  electromigration  occur  as  voids  and/or  hillocks  form  within  the 
semiconductor  device.  Short  circuits  are  the  result  of  hillocks  breaking  the  oxide  layer, 
allowing  the  conductor  to  come  in  contact  with  other  device  features.  Alternatively, 
voids  and  microcracks  may  increase  the  resistance  in  a  conductor  as  the  cross  sectional 
area  is  reduced.  While  the  increased  resistance  alone  may  result  in  device  failure,  the 
increase  in  local  current  density  and  temperature,  resulting  from  the  increased 
resistance,  can  lead  to  thermal  runaway  and  catastrophic  failure  [27],  such  as  an  open 
circuit  failure.  Other  types  of  damage  include  whiskers,  thinning,  localized  heating, 
and  cracking  of  the  passivating  dielectrics  [18]. 
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Material  Depletion 


Anode 


Material  Build-up 


Figure  3.2:  Electromigration  in  a  wire.  This  figure  shows  how  material  is  de¬ 
pleted  at  the  cathode  and  deposited  at  the  anode.  The  effect  is  exaggerated  in 
this  illustration. 
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Electromigration  can  occur  in  any  conductor  when  high  energy  densities  are 
present  (greater  than  105  A/cm2  [21]).  All  powered  metals  within  a  semiconductor  are 
potentially  susceptible  to  electromigration.  In  particular,  the  areas  of  greatest  concern 
are  the  thin-film  metallic  interconnects  between  device  features,  contacts  and  vias  [18]. 

3.4. 1.1  Physics  of  Failure 

At  the  atomic  level  there  are  two  competing  forces  operating  on  a  conductor  [28].  The 
first  is  a  ‘direct’  force  resulting  from  the  electrical  field.  This  force  exerts  an 
electrostatic  pull  on  positively  charged  ion  cores  toward  the  cathode.  The  second  force 
is  the  ‘wind’  force  due  to  the  scattering  of  electrons  off  the  ions.  This  force  acts  in  the 
opposite  direction,  toward  the  anode.  At  high  current  densities  the  ‘wind’  force  is 
stronger  than  the  ‘direct’  force  so  the  diffusion  of  the  ions  is  biased  in  the  direction  of 
the  electron  flow  (anode  or  positive).  Together  these  forces  are  referred  to  as  the 
‘electron-wind’  force. 

The  electromigration  effects  of  the  electron-wind  depend  on  the  material 
characteristics  of  the  conductor.  The  activation  energy  for  electromigration  is 
dependent  on  properties  such  as  the  material  type,  the  size  and  orientation  of  the  grains, 
stress,  heating  and  even  the  length  of  the  conductor.  For  instance,  even  small  additions 
of  one  material  to  a  second  can  have  a  great  impact  on  the  conductor’s  lifetime.  As  an 
example,  pure  bulk  A1  has  an  activation  energy  of  1.4  eV,  but  the  addition  of  small 
amounts  of  Cu  (0.3-5wt%Cu)  can  reduce  this  activation  energy  by  about  0.5-0. 8  eV 
[18]. 
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Grain  size  and  pattern  also  have  a  large  impact  on  the  effective  activation  energy  of 
the  material.  For  instance,  thin  films  with  a  large  grain  size  the  activation  energy  ranges 
from  1-2  eV.  For  very  fine  grained  samples,  the  activation  energy  may  be  as  low  as 
0.4-0. 6  eV.  This  illustrates  the  existence  of  grain  boundary  mass  transport-induced 
damage.  The  damage  is  greatest  at  the  triple  points  where  three  or  more  grains  meet. 
These  points  act  as  nuclei  for  electromigration.  A  grain  pattern  eliminating  these 
increases  the  electromigration  resistance  of  a  conductor.  Triple  points  are  eliminated  as 
the  conductor  linewidth  decreases  to  a  size  smaller  than  the  grain  size.  At  this  point  the 
line  grain  begins  to  resemble  bamboo,  i.e.  grains  lined  up  end  to  end.  This  is  illustrated 
in  Figure  3.3.  The  effect  is  that  for  aluminum  interconnects,  the  electromigration 
lifetime  increases  as  linewidths  shrink  below  2  /im. 


Regular  Grain  Pattern  Bamboo  Grain  Pattern 

Figure  3.3:  ‘Bamboo’  Grain  Pattern.  On  the  left  is  a  typical  grain  pattern  with 

a  triple  point  is  highlighted.  On  the  right  is  a  ‘bamboo’  pattern  [21]. 

Another  parameter  affecting  electromigration  is  stress  gradients  within  the  metal.  A 
stress  gradient  can  induce  atomic  motion  within  a  material.  Atoms  migrate  from 
regions  of  compressive  stress  to  regions  of  tensile  stress.  One  effect  of  this  stress 
induced  force  is  the  cessation  of  edge  migration  when  a  conductor  is  shorter  than  a 
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critical  length,  Lc,  causing  the  stress-induced  flow  of  atoms  to  counter  the 
electromigration  movement.  The  result  for  any  given  current  density  is  a  critical  length 
of  conductor  below  which  electromigration  ceases  [29].  This  is  known  as  the  ‘Blech 
Length’. 

In  addition  to  stress  gradients,  temperature  gradients  also  have  an  effect  on 
electromigration.  Joule  heating  from  high  RMS  currents  can  create  thermal  gradients. 
While  these  gradients  may  only  cover  a  few  tens  of  degrees  temperature  change,  the 
temperature  change  over  a  few  microns  results  in  large  gradients  [29].  Since 
electromigration  is  a  thermally  activated  process,  the  temperature  gradients  produce 
flux  divergences  like  those  found  at  contacts  or  other  device  features. 

Increasingly,  ICs  have  been  making  use  of  low  resistivity  Cu  interconnects.  With  its 
lower  atomic  diffusivity,  Cu  would  be  expected  to  demonstrate  a  substantially  improved 
resistance  to  electromigration  and  electromigration  induced  failure  [30].  But  this  has 
not  been  the  case,  reliability  improvements  have  been  less  than  expected.  The  surface 
self-diffusion  in  copper  appears  to  be  faster  than  gain-boundary  self-diffusion.  Thus 
the  surfaces  provide  high  diffusivity  paths  bypassing  the  grain  boundaries  resulting  in 
the  insensitivity  of  copper’s  electromigration  lifetime  to  different  grain  structures. 
Hau-Reige  and  Thompson  [30]  suggest  that  the  reliability  of  Cu  interconnects  could  be 
improved  by  suppressing  the  interface  and  surface  diffusion.  This  would  allow  the 
grain  structure  to  affect  the  electromigration  reliability  of  Cu  as  it  does  in  Al. 

Other  processes  have  also  been  applied  to  the  fabrication  of  Cu  interconnects,  the 
damascene  scheme  in  particular.  An  investigation  by  Yokogawa  [31]  showed  that  the 


45 


reliability  of  Cu  interconnects  is  50  times  that  of  reactive  ion  etching  (RIE)  Al-Cu 
interconnects.  He  also  observed  single-level  damascene  Cu  interconnects  providing  a 
30  times  longer  lifetime  than  multi-level  damascene  Cu  interconnects  at  the  same 
current  density. 


3. 4. 1.2  Lifetime  Prediction 


Modeling  electromigration  median  time  to  failure  (MTTF)  from  the  first  principles  of 
the  failure  mechanism  is  difficult.  While  there  are  many  competing  models  attempting 
to  predict  time  to  failure  from  first  principles,  there  is  no  universally  accepted  model. 
Currently,  the  favored  method  to  predict  time  to  failure  is  an  approximate  statistical 
one — Black’s  equation. 

Using  Black’s  equation,  the  MTTF  is  described  by 


MTTF  =  A  j~n  exp 


(3.9) 


where  je  is  the  current  density  (A/cm)  and  Ea  is  the  activation  energy.  Failure  times  are 
described  by  the  log-normal  distribution  [32].  A  variation  of  Black’s  equation  [22], 


MTTF  =  A(je 


jcrit)  "exp 


(3.10) 


accounts  for  the  Blech  length.  Here,  jcrit  represents  the  the  critical  current  density 
required  for  electromigration  with  this  value  being  inversely  related  to  the  Blech  length. 

Black’s  equation  assumes  activation  energy  is  independent  of  line  width  and 
temperature.  The  symbol  A  is  a  constant  dependent  on  a  number  of  factors,  including 
grain  size,  line  structure  and  geometry,  test  conditions,  current  density,  thermal  history, 
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etc.  Black  determined  the  value  of  n  to  be  n  =  2.  However,  n  is  highly  dependant  on 
residual  stress  [25]  and  current  density  [18].  There  is  a  great  deal  of  disagreement  on 
the  values  of  n.  Jensen  [21]  gives  a  range  of  1-3  while  Ohring  [18]  shows  n  reaching 
values  as  high  as  10  when  je  approaches  1  x  107  A/cm2.  JEDEC  [25]  considers  n  —  2 
to  be  valid  around  1-2  x  106  A/cm2,  with  the  possibility  of  n  ranging  from  1-2  with 
variations  in  residual  stress. 

JEDEC  also  provides  a  range  of  values  for  the  activation  energy,  Ea,  of  aluminum 
(Al)  and  aluminum  alloys.  The  typical  value  is  Ea  =  0.6  eV  with  a  range  of  0.5-0. 7  eV 
[25].  The  activation  energy  can  vary  due  to  mechanical  stresses  caused  by  thermal 
expansion.  This  effect  resembles  a  temperature  dependent  activation  energy  and  can 
produce  errors  on  the  order  of  0.1  eV.  This  effect  is  most  noticeable  in  narrow 
interconnect  lines  under  thick  passivation. 

Other  estimates  have  been  provided  in  literature.  Investigation  of  Al-0.5%  Cu 
interconnects  [33]  provided  estimates  of  n  —  2.63  and  an  activation  energy  of 
Ea  —  0.95  eV.  For  multi-level  Damascene  Cu  interconnects,  the  activation  energy  was 
Ea  —  0.94  ±  0.1 1  eV  at  a  95%  confidence  interval  (Cl)  and  the  value  of  the  current 
density  exponent  was  found  to  be  n  =  2.03  ±0.21  (95%  Cl)  [31]. 

3.4. 1 .3  Lifetime  Distribution  Model 

The  traditional  lifetime  distribution  used  for  electromigration  has  been  the  lognormal. 
Most  test  data  appears  to  fit  well  to  a  lognormal  distribution,  but  this  data  is  typically 
for  the  failure  time  of  a  single  conductor  [34].  Through  the  testing  of  over  75,000 
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Al(Cu)  connectors,  Gall  et  al  [34]  showed  that  the  electromigration  failure  mechanism 
does  follow  the  lognormal  distribution.  This  is  valid  for  the  time  to  failure  of  the  first 
link  with  the  assumption  that  the  first  link  failure  will  result  in  device  failure.  The 
limitation  is  that  a  lognormal  distribution  is  not  scalable.  A  device  with  different 
numbers  of  links  will  fail  with  a  different  lognormal  distribution.  Thus  a  measured 
failure  distribution  will  only  be  valid  for  the  device  on  which  it  is  measured. 

Additionally,  in  this  study,  Gall  also  showed  that  the  Weibull  (and  thus  the 
exponential)  distribution  is  not  a  valid  model  for  electromigration  by  demonstrating 
that  as  the  number  of  possible  failure  links  in  a  device  increases,  the  spread  of  failure 
times  decreases,  meaning  the  /3  shape  parameter  from  the  Weibull  distribution  would 
have  to  be  decreasing.  If  the  Weibull  distribution  was  a  valid  model  the  /3  would 
remain  constant  regardless  of  the  number  of  possible  link  failures  and  only  the 
characteristic  life  would  change. 

Even  though  the  lognormal  distribution  is  the  best  fit  for  predicting  the  failure  of  an 
individual  device  due  to  electromigration,  the  exponential  model  is  still  applicable  for 
modeling  electromigration  failure  in  a  system  of  many  devices.  This  is  due  to  the 
usefulness  of  the  exponential  distribution  in  modeling  the  failure  rate  of  large  systems 
(see  Section  3.3.1). 

3 . 4 . 1 . 4  Lifetime  S  ensitivity 

The  sensitivity  of  the  electromigration  lifetime  can  be  observed  by  plotting  the  lifetime 
against  varying  values  of  the  input  parameters.  For  electromigration,  the  most 
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significant  input  parameters  corresponding  to  lifetime  are  temperature  ( T )  and  current 
density  (je).  Lifetime  may  be  non-dimensionalized  by  using  an  acceleration  factor. 

Substituting  Black’s  equation  (Eq.  3.9) — and  assuming  an  exponential  failure 
distribution — into 

,  ^ rated  /o  i  i  \ 

Ay  —  — ^ —  (4.11) 

provides  the  acceleration  factor  for  electromigration, 


AfEM 


(3.12) 


Assuming  some  nominal  values  for  Ea,  je  and  T  [22]  we  can  plot  the  response  of  the 
acceleration  factor  versus  scaled  input  parameters.  Figure  3.4  shows  how  Ay  changes 
for  scaled  values  (scaling  multiplier  times  rated  value)  of  T  and  je  ranging  from  0.8-1. 2 
times  the  rated  values.  In  this  example,  T  has  a  much  greater  impact  on  Ay  than  je. 


3. 4. 1.5  Outlook 

As  device  features  continue  to  shrink,  and  the  energy  densities  within  interconnects 
grow,  electromigration  will  remain  a  concern.  New  technologies  and  techniques,  such 
as  Cu  interconnects  or  designing  under  the  Blech  length,  may  reduce  the  impact  of 
increasing  densities.  Historically  however,  as  electromigration  problems  are 
eliminated,  new  performance  demands  materialize  that  require  increased  interconnect 
reliability  under  conditions  where  the  metallization  has  decreased  inherent  reliability 
[29].  Because  of  this,  electromigration  will  remain  a  design  and  wearout  issue  in  future 
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Figure  3.4:  AfEM  versus  scaling  multiplier  for  T  (dashed  line)  and  je  (solid 
line).  ECIem  =  0.8  eV,  T  =  85°C,  je  =  2.5  x  105  A/cm2,  n  =  2. 

semiconductor  designs. 

3.4.2  Hot  Carrier  Effects 

Hot  Carrier  Effects  are  manifested  in  two  distinct  wearout  mechanisms.  These  are  Hot 
Carrier  Degradation  (HCD)  and  Negative  Bias  Temperature  Instability  (NBTI).  Hot 
carrier  effects  are  the  result  of  high  energy  carriers,  either  holes  or  electrons,  entering 
the  gate  oxide  of  a  transistor  leading  to  the  degradation  of  the  oxide’s  properties.  Hot 
carriers  are  produced  as  current  flows  through  the  channel  from  the  source  to  the  drain. 
A  small  number  of  these  hot  carriers  gain  enough  energy  to  be  injected  into  the  gate 
oxide.  This  results  in  charge  trapping  and  the  generation  of  interface  states.  Over  time 
this  leads  to  a  shift  in  the  performance  characteristics  of  the  device  and  eventually  to  a 
reduction  in  performance.  This  is  referred  to  as  HCD.  Device  lifetime  can  be 
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determined  by  defining  failure  as  a  percentage  shift  in  threshold  voltage,  change  in 
transconductance,  or  a  variation  in  drive  or  saturation  current.  NBTI  is  caused  by  hole 
trapping  and  interface  state  generation.  This  results  in  threshold  voltage  shifts  and 
delays  within  a  CMOS  device  [35]. 

As  device  feature  sizes  continue  to  shrink,  hot  carrier  effects  are  expected  to  be  an 
increasing  source  of  concern  [36].  The  rate  of  hot  carrier  degradation  is  directly  related 
to  the  length  of  the  channel,  oxide  thickness  and  the  voltage  of  the  device.  Since  the 
decrease  of  device  operating  voltages  are  chosen  for  optimal  performance  at  a  given 
life,  the  scaling  has  not  kept  pace  with  the  reduction  in  channel  length.  There  has  been 
an  increase  in  current  density  with  a  corresponding  increase  in  semiconductor  device 
susceptibility  to  hot  carrier  effects. 

3.4.2. 1  Physics  of  Failure 

Hot  carriers  are  generated  during  the  operation  of  semiconductor  devices,  as  it  switches 
during  a  transition.  As  carriers  travel  through  the  channel  from  source  to  drain,  the 
lateral  electric  field  near  the  drain  junction  causes  carriers  to  become  hot  [37].  A  small 
percentage  of  these  hot  carriers  gain  sufficient  energy — higher  than  the  Si-Si02  energy 
barrier  of  about  3.7  eV — and  the  proper  direction  of  travel  to  be  injected  into  the  gate 
oxide.  In  nMOS  (negative-channel  metal-oxide  semiconductor)  devices,  hot  electrons 
are  generated  while  hot  holes  are  produced  in  pMOS  (positive-channel  metal-oxide 
semiconductor)  devices.  Injection  of  either  carrier  results  in  three  primary  types  of 
damage:  trapping  of  electrons  or  holes  in  pre-existing  traps,  generation  of  new  traps 
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and  the  generation  of  interface  traps  [36].  These  traps  may  be  classified  by  location 
[38]  while  their  effects  vary.  Interface  traps  are  located  at  or  near  the  Si-Si02  interface 
and  directly  affect  transconductance,  leakage  current  and  noise  level.  Oxide  traps  are 
located  further  away  from  the  interface  and  affect  the  long  term  MOSFET  stability, 
specifically  threshold  voltage.  Effects  of  the  defect  generation  includes  threshold 
voltage  shifts,  transconductance  degradation  and  drain  current  reduction  [37].  NBTI 
seems  to  have  similar  degradation  patterns,  except  for  pMOS,  so  both  will  be  treated  as 
the  same  in  this  work. 

Hu  [39]  proposed  the  ‘lucky’  electron  model  for  hot  carrier  effects.  This  is  a 
probalistic  model  built  on  the  concept  that  an  electron  must  first  gain  enough  kinetic 
energy  from  the  channel  to  become  ‘hot’,  and  then  the  electron’s  momentum  must 
become  redirected  perpendicularly  so  the  electron  can  enter  the  oxide.  The  following 
explanation  of  the  lucky  electron  model  is  adapted  from  Ohring  [18]. 

The  model  begins  by  defining  the  probability  that  an  electron  can  travel  a  distance 
d  or  more  without  a  collision, 

^(distance  >  d)  —  (3.13) 

where  Xe  is  the  mean  free  path  between  scattering  events,  d  =  ®  j  q£'c.  (j)  is  energy,  Sc  is 
the  channel  electric  field  and  q  is  the  electron  charge.  To  reach  the  gate  oxide,  electrons 
need  to  gain  sufficient  energy  to  become  ‘hot’ .  In  addition,  they  must  be  redirected  on  a 
path  perpendicular  to  the  channel.  The  currents  at  the  substrate  (isub)  and  gate  (igate) 
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are  indicators  of  the  creation  of  hot  carriers.  The  model  for  isuf,  is 


hub  —  f'l  id  rain  CXp 


(3.14) 


The  exponential  term  comes  from  the  dependence  of  the  impact  ionization  coefficient 
on  \!SC  while  i drain  is  the  drain  current.  The  parameter  C\  is  a  constant,  or  more 
precisely  a  weak  function  of  <§c  and  the  device  parameters.  Hu  states  C\  ~  2  and 
—  1.7  x  106.  By  defining  /3,  as  the  ratio  of  0t/qXc,  where  <?),  is  the  minimum  energy  a 
hot  electron  requires  in  order  to  create  impact  ionization,  Eq.  3.14  can  be  rewritten  as 


hub  —  C\i drain  CXp 


(3.15) 


Similarly  the  gate  current  is  defined  by 


lgate 


Cli drain  6Xp 


Canceling  the  XeAc  product  between  Eqs.  3.15  and  3.16  results  in 

Igate 
Idrain 


C2 


hub 


-Y 


i drain  J 


(3.16) 


(3.17) 


where  m  —  (pi,/ (pi  ~  3  [18,  36]  and  represents  the  energy  of  the  electrons  causing 
damage. 

During  normal  operation,  the  value  of  igate  is  negligible.  Degradation  due  to 
hot-carriers  is  proportional  to  igate  making  gate  current  a  good  measure  of  the  damage. 
If  the  damage  resulting  from  HCD  is  designated  by  A  with  the  time  rate  of  change 
proportional  to  igate  [18],  then 


dA 

~dt  ~  lgate 


^4 (A)  ;  (  hub 

777  Idrain  I  “  I 

w  \l drain  J 


(3.18) 
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The  constants  C\  and  C2  have  been  absorbed  into  A  (A)  along  with  an  additional  factor 
to  account  for  the  dependence  of  HCD  on  existing  damage  while  W  is  the  width  of  the 
MOSFET.  By  letting  B  —  A(A)/W,  and  knowing  that  MTTF  depends  on  the  reciprocal 
of  dA/dt,  the  failure  rate  is  found  from 

(\  m 

— )  (3.19) 

dlrain  J 

This  equation  assumes  static  (dc)  voltages  and  currents.  To  account  for  dynamic 
degradation  we  can  use 

^  =  |-  [TCidrain(  —  )  dt  (3.20) 

JO  \  l drain  J 

where  Tc  is  the  full  cycle  time. 

Temperature  plays  an  interesting,  though  small  role  in  hot  carrier  degradation.  As 
mentioned  before,  the  activation  energy  for  HCD  is  negative,  implying  that  HCD 
reduces  with  increasing  temperature.  At  low  temperatures,  substrate  current  increases 
because  drain  current  increases.  According  to  Acovic  [40],  the  effects  of  oxide 
degradation  are  increased  at  low  temperatures  because  electrons,  due  to  their  lower 
thermal  energy,  have  a  hard  time  surmounting  the  potential  barrier  in  the  negatively 
charged  degraded  zone.  Another  possibility  is  that  freeze  out  of  impurities  in  the  drain 
at  low  temperatures  make  nMOSFETs  more  sensitive  to  electrons  trapped  in  the  drain 
region,  increasing  degradation.  Degradation  decreases  at  high  temperatures  because  of 
the  decreases  in  drain  current  and  mean  free  path. 

NBTI  differs  from  hot  carrier  degradation  in  that  NBTI  causes  a  shift  in  the  device 
threshold  voltage.  The  mechanism  for  NBTI  damage  are  holes  trapped  within  the 
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interface  between  the  SiCE  gate  insulator  and  the  Si  substrate.  NBTI  damage  is  most 
prevalent  in  pMOSFET  devices  where  holes  are  thermally  activated  and  gain  sufficient 
energy  to  disassociate  the  interface/oxide  defects  near  the  Lightly  Doped  Drain  (LDD) 
regions  [41].  This  happens  at  the  LDD  regions  because  of  the  higher  hole 
concentrations  near  the  gate  edge. 

The  first  stage  of  the  NBTI  failure  process  begins  with  the  generation  of  interface 
states  and  the  production  of  hydrogen  atoms/ions  at  the  interface.  In  time,  the  transport 
of  hydrogen  atoms  in  the  oxide  dominates  [41].  The  diffusion  of  hydrogen  is  controlled 
by  two  factors.  The  first  is  the  oxide  field  resulting  from  existing  hole  trapping  and  the 
formation  of  positive  fixed  oxide  charges.  The  second  factor  is  the  increase  in  interface 
states.  The  diffusion,  or  generation  or  more  interface  states,  is  discouraged  by  this 
increase.  The  gradual  saturation  of  A Vtjx  attributed  to  the  formation  of  oxide-trapped 
holes  and  fixed  oxide  charges  which  modify  the  oxide  field  to  oppose  the  further 
transport  of  hydrogen  atoms. 

3. 4. 2. 2  Lifetime  Prediction 

The  lucky  electron  model  does  not  fully  answer  the  question  on  how  to  predict  hot 
carrier  degradation  lifetime.  Many  other  researchers  have  offered  models  for  lifetime, 
but  none  are  fully  accepted.  Since  there  is  no  direct  method  of  measuring  device 
lifetime  [42],  the  Arrhenius  relationship  remains  a  favored  lifetime  prediction  tool.  The 
following  models  are  from  JEP-122A  [22].  It  contains  two  models,  the  first  for  nMOS 
device  and  the  second  for  pMOS. 
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The  N -Channel  model  is  for  nMOS  devices.  In  these  devices  substrate  current  is  an 


indicator  of  hot  carriers.  The  equation  is  as  follows: 

MTTF  =  B(isub)~Ncxp(Ea/kT)  (3.21) 


where  B  is  a  scale  factor  as  a  function  of  doping  profiles,  sidewall  spacing,  dimensions, 
etc,  isub  is  substrate  current,  N  ranges  from  2  to  4,  and  Ea  is  the  activation  energy  in  the 
range  of  -0.1  eV  to  -0.2  eV.  In  pMOS  devices,  hot  holes  do  not  show  up  as  substrate 
current.  However  the  gate  current  can  serve  as  an  indicator  of  hot  carriers.  Thus  the 
P-Channel  model  is: 

MTTF  =  B(igate)~Mexp(Ea/kT )  (3.22) 

where  B  and  Ea  are  the  same  as  before  while  igate  is  the  peak  gate  current  during 
stressing  and  M  ranges  from  2  to  4.  However,  the  Arrhenius  term  is  not  necessarily 
appropriate  at  all  for  these  mechanisms. 

A  simplified  version  of  Eq.  3.19  [36]  may  be  used  to  relate  lifetime  prediction  to 
the  drain  voltage  and,  ultimately,  the  supply  voltage  (Ydd)- 


MTTF  =  C  exp 


(3.23) 


In  this  equation  both  C  and  B  are  constants  determined  from  life  testing.  The  limitation 
is  this  model  is  valid  for  only  a  small  range  of  gate  voltages  near  the  maximum 
substrate  current.  Here,  the  Arrhenius  term  is  not  used  because  of  the  small  affect  of 
temperature  on  HCD. 
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The  lifetime  for  NBTI  is  described  by  a  simple  power  law  relationship  [41] 


MTTF 


</av^ 

c 


(3.24) 


where  A Vth  is  the  change  in  threshold  voltage,  C  is  a  constant,  and  n  is  the  rate.  As  an 
example,  in  their  experiment  Chen  et  al  [41]  determined  n  ~  0.71  during  the  reaction 
limited  portion.  This  changes  to  approximately  n  ~  0.37  as  the  process  becomes 
diffusion  controlled. 


3. 4. 2. 3  Lifetime  Distribution  Model 

There  is  little  discussion  in  literature  about  a  proper  statistical  lifetime  distribution 
model  for  hot  carrier  degradation.  A  logical  hypothesis  for  the  form  of  the  lifetime 
distribution  would  be  the  exponential.  This  is  a  good  assumption  because  as  devices 
become  more  complex,  with  millions  of  gates,  the  device  becomes  a  complex  system. 
The  probability  of  failure  for  each  individual  gate  most  likely  is  not  an  exponential 
distribution.  But  the  cumulative  effect  of  early  failures  and  process  variability,  ensuring 
each  gate  has  a  different  failure  rate,  widens  the  spread  of  the  device  failures.  The  end 
result  is  that  the  intrinsic  hot  carrier  degradation  becomes  more  random  and  statistically 
indistinguishable  from  random  failures  as  the  failures  occur  at  a  constant  rate  over  time. 

3. 4. 2. 4  Lifetime  Sensitivity 

Like  electromigration,  the  sensitivity  of  hot  carrier  degradation  lifetime  to  changes  in 
the  input  parameters  may  be  observed.  Using  Eq.  3.23  as  an  example,  the  acceleration 
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factor  for  hot  carrier  degradation  is, 


The  response  of  Af  to  scaled  values  of  V^d  is  plotted  in  Figure  3.5. 
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(3.25) 


Figure  3.5:  Af  versus  Vdd  scaling  multiplier.  Vdd,max  —  3.3V  and  B  —  70  [36]. 


3. 4. 2. 5  Outlook 

Hot  Carrier  Degradation  is  expected  to  be  a  reliability  concern  as  device  feature  sizes 
continue  to  shrink.  HCD  is  a  function  of  electrical  fields  internal  to  the  device.  Channel 
length,  oxide  thickness  and  device  operating  voltage  all  affect  the  strength  of  the  fields 
and  the  rate  of  degradation.  As  devices  are  scaled  downwards,  channel  lengths  get 
shorter  decreasing  hot  carrier  reliability.  However,  the  oxide  thickness  and  voltage  can 
also  be  reduced  to  help  alleviate  the  reduction  in  reliability.  Other  methods  of 
improving  hot  carrier  reliability  include  possibly  shifting  the  position  of  the  maximum 
drain  so  it  is  deeper  in  the  channel  [40].  This  would  result  in  hot  carriers  being 
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generated  further  away  from  the  gate  and  Si-SiCb  interface,  reducing  the  likelihood  of 
being  injected  into  the  gate.  Another  method  is  to  reduce  the  substrate  current  by  using 
Lightly-Doped-Drain  (LDD)  where  part  of  the  voltage  drop  is  a  lightly  doped  drain 
extension  not  covered  by  the  gate.  Annealing  the  oxides  in  NH3,  N2O  or  NO  or 
growing  them  directly  in  N2O  or  NO  improves  their  resistance  to  interface  state 
generation  by  the  hot  carriers. 

NBTI  has  become  a  concern  as  device  feature  sizes  shrunk.  NBTI  became  evident 
with  0.13  /im  processes  as  devices  required  much  thinner  gate  oxides  and  introduced 
nitrides  in  the  Si02  to  prevent  boron  penetration  into  the  gate  [35].  Another  source  of 
concern  is  plasma-induced  damage  during  interconnect  creation  resulting  in  driving 
hydrogen  atoms  into  the  Si-Si02  interface. 

3.4.3  Time  Dependent  Dielectric  Breakdown 

Time  Dependent  Dielectric  Breakdown  (TDDB),  also  known  as  oxide  breakdown,  is  a 
source  of  significant  reliability  concern  for  future  semiconductor  devices.  When  an 
electric  field  is  applied  across  the  dielectric  gate  of  a  transistor,  the  continued 
degradation  of  the  material  results  in  the  formation  of  conductive  paths  and  the 
shorting  of  the  anode  and  cathode  [22].  The  concern  is  that  this  process  will  be 
accelerated  as  the  thickness  of  the  gate  oxide  decreases  with  continued  device  scaling. 

The  TDDB  process  takes  place  in  two  stages  [43].  In  the  first  stage,  the  oxide  is 
damaged  and  degraded  over  a  long  time  period  from  the  localized  hole  and  bulk 
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electron  trapping  within  the  oxide  and  at  the  oxide  interfaces.  The  second  stage  is 
reached  when  the  increasing  number  of  traps  within  the  oxide  form  a  percolation 
(conduction)  path  through  the  oxide  (see  Figure  3.6).  This  short  circuit  between  the 
substrate  and  gate  electrode  results  in  the  failure  of  the  oxide.  This  process  has  been 
successfully  modeled  using  monte  carlo  simulations  (a  percolation  model)  to  randomly 
create  spherical  traps. 


Figure  3.6:  Formation  of  a  percolation  path.  A  small  number  of  traps  (circles) 
are  initially  in  the  oxide  (a).  Over  time  more  traps  form  (b)  until  the  number 
of  traps  is  great  enough  to  create  an  interconnected  conduction  path  though  the 
oxide  (c). 

The  formation  of  the  percolation  path  may  result  in  one  of  two  types  of  failure. 
When  the  path  forms,  current  flow  through  the  path  causes  a  sudden  release  of  energy. 
If  this  energy  is  sufficient  it  will  cause  runaway  thermal  heating  and  melting  of  the 
oxide,  destroying  the  gate.  This  is  termed  hard  breakdown.  If  there  is  not  sufficient 
energy  to  result  in  hard  breakdown,  then  a  soft  breakdown  will  result.  With  a  soft 
breakdown  the  device  continues  to  function.  It  has  been  speculated  that  soft  breakdown 
does  not  even  significantly  affect  transistor  operation  [44],  although  it  may  still  lead  to 
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the  failure  of  short  channel  devices.  However,  while  the  change  in  both  threshold 
voltage  and  leakage  from  soft  breakdown  is  small  and  initially  does  not  effect  device 
operation,  the  effects  are  cumulative.  It  may  be  possible  that  multiple  soft  breakdowns 
will  result  in  an  increase  in  leakage  current  to  unacceptable  levels  [45]. 

Different  authors  define  TDDB  lifetime  differently.  For  example,  one  definition  of 
lifetime  is  the  time  required  for  the  degradation  to  build  up  to  the  level  required  for 
runaway  [43],  a  hard  breakdown.  Alternatively,  failure  may  be  defined  as  the  time  until 
the  first  detectable  electrical  event  [46].  This  may  be  either  a  soft  or  hard  breakdown. 
Given  this,  the  definition  of  failure  depends  on  the  function  of  the  device  and  what  the 
TDDB  effects  are  on  that  device’s  proper  operation. 

3.4.3. 1  Physics  of  Failure 

The  mechanisms  of  oxide  failure  are  still  a  subject  of  discussion.  Several  different 
theories  to  explain  the  phenomenon  have  been  put  forth.  Two  of  the  leading  theories 
are  the  Anode  Hole  Injection  (1/E)  model  and  the  thermochemical  (E)  model.  Both  of 
these  models  fit  the  experimental  data  in  certain  ranges  of  temperature  and  field,  but  the 
controversy  over  which  is  better  remains.  Toward  this  end,  explanations  have  been 
published  which  suggest  that  both  models  are  simply  parts  of  a  larger  model.  In 
contrast,  the  latest  theories  explain  TDDB  as  having  a  voltage  driven  mechanism  [47]. 

The  Anode  Hole  Injection  model,  or  1/E  model,  assumes  that  the  failure  rate  is 
inversely  related  to  the  field.  Wear-out  occurs  as  shallow  traps  are  formed  within  the 
oxide.  These  traps  weakly  bind  charge  and  as  time  progresses,  new  traps  form  and 
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older  traps  deepen.  More  charge  is  captured  until  a  critical  value  QBD  is  reached  [18]. 
The  time  to  breakdown  is  given  by 

MTTF  =  ^  =  t0 exp  ( (3.26) 
JFN  \  Fox  J 

where  jpN  is  the  tunneling  current,  Sox  is  the  oxide  field,  ta  is  a  prefactor  and  Gr  is  the 
field  acceleration  parameter.  The  value  of  Gr  ranges  from  290  to  350  MV/cm 
depending  on  the  oxide  thickness  and  stress  type  [36].  Ohring  [18]  defines 

t0  =  5.4  x  10~7exp(— 0.28eV/UT)  sec 
Gr(T)  =  120+^  MV/cm 

where  these  two  variables  are  temperature  dependent  and  take  into  account  hole 
generation  and  trapping  efficiencies. 

The  thermochemical  model,  E-model,  assumes  a  direct  correlation  between  the 
electric  field  and  oxide  degradation.  This  assumption  has  not  been  conclusively  proven 
[36],  but  the  model  does  provide  a  good  fit  with  experimental  data  .  Using  this  model, 
the  time  to  failure  is  given  by 

MTTF  =  t0zxV{-y£m)  (3.27) 

where  t0  and  y  are  empirically  determined  constants.  JEP-122A  [22]  defines  the 
E-model  similarly,  but  includes  an  Arrhenius  term  as  well.  Thus 

-  74j  (3.28) 

The  question  of  which  model,  if  either,  is  right  is  an  active  area  of  research  by 
many  parties.  Both  models  fit  the  TDDB  data  well  over  limited  ranges  of  the  electric 


MTTF  =  t()  exp  (  - ^ 
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field  [48].  Groeseneken  [36]  states  that  both  the  E  and  1/E-models  are  mainly  valid  for 
oxide  thicknesses  greater  then  5  nm.  Yassine  [49],  in  testing  TDDB  for  oxide  fields 
ranging  from  4.6  to  10.4MV/cm,  reports  that  the  TDDB  of  ultrathin  oxides  follows  the 
E-model  down  to  4.6  MV/cm,  with  the  1/E-model  deviating  from  empirical  data  below 
7.2MV/cm.  McPherson  [48]  argues  both  models  are  correct  and  are  part  of  a 
complementary  model  where  both  field-induced  (E-model)  and  current-induced 
(1/E-model)  degradation  mechanisms  occur  simultaneously.  If  one  mechanism  is 
dominant,  then  the  model  reduces  to  either  the  E  or  1/E-model.  For  example,  when 
electric  fields  are  greater  than  3  MV/cm  (with  molecular  bond  strengths  greater  than 
3  eV),  it  reduces  to  the  1/E-model.  When  bond  strengths  are  below  3  eV,  the  E-model 
dominates. 

The  debate  about  E  vs,  1/E  models  is  most  applicable  for  thick  oxides.  For 
ultra-thin  oxides  evidence  shows  that  gate  voltage  is  the  primary  driver  of  the 
breakdown  process  [50].  Additionally,  there  is  evidence  that  the  temperature 
dependence  of  ultra-thin  oxides  is  non- Arrhenius.  Observations  show  the  temperature 
acceleration  factor  is  larger  at  higher  temperatures  .  To  account  for  these  observations, 
Wu  et  al.  [50]  has  proposed  a  relationship  in  the  form  of 

MTTF  =  7 'bdo(V)  exp  (^p-  +  (3.29) 

where  TBdo{V )  is  a  voltage  dependent  prefactor  and  a  and  b  are  both  voltage  dependent 
as  well.  The  second  order  term,  b/t2,  is  included  in  order  to  account  for  any 
non- Arrhenius  temperature  effects.  Values  for  the  terms  were  not  yet  determined. 
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3. 4. 3. 2  Lifetime  Distribution  Model 


The  model  typically  used  in  accelerated  testing  to  extrapolate  failure  time  is  the 
lognormal  distribution.  However,  for  TDDB  the  Weibull  distribution  provides  a  more 
accurate  fit  for  large  samples  of  time  to  failure  data  [46] .  As  oxide  thickness  decreases 
the  time  to  failure  distribution  becomes  wider.  This  results  in  the  Weibull  shape 
parameter  decreasing  as  the  gate  oxide  thickness  decreases  [46,  51,  52].  The  decrease 
in  /3  is  a  result  of  Nbd  decreasing  with  the  oxide  thickness.  When  the  oxide  thickness 
equals  the  diameter  of  a  defect,  then  only  one  defect  is  required  to  cause  a  failure.  This 
corresponds  to  a  /3  of  one  and  happens  with  oxides  about  or  below  2. 2-2. 7  nm  [53].  In 
addition,  the  effect  of  variations  in  oxide  thickness  widen  the  failure  distribution, 
further  contributing  to  a  reduction  in  the  /3  value.  Experimental  results  [50]  show  that 
the  /3  factor  is  independent  of  temperature  and  voltage. 


3. 4. 3. 3  Lifetime  Sensitivity 

As  an  example,  the  derating  factor  for  TDDB  may  be  computed  using  the  E-model  as  it 
is  the  most  accepted  model  for  thin  oxide.  The  acceleration  factor  is 


^ JiDDB  CXp  f  y( <oox  (?ox, rated)) 
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Figure  3.7  plots  the  response  of  Af  versus  the  scaling  multiplier  for  temperature  and 
electric  field  (Eox). 
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Figure  3.7:  AfTDDB  versus  scaling  multiplier.  For  T  (dashed  line)  and  $ox 
(solid  line).  EaTDDB  —  0.75  eV,  T  =  85°C,  Eox  —  4  MV/cm,  y  =  3  Naperians 
per  MV/cm. 

3. 4. 3. 4  Outlook 

The  expected  high  electrical  fields  in  future  semiconductor  devices  will  have  an  impact 
on  the  degradation  of  the  gate  oxides.  This  degradation  can  eventually  lead  to  a  sudden 
breakdown  of  the  dielectric  layer  and  failure  of  the  device.  A  complete  understanding 
of  this  process  does  not  yet  exist  [36]  and  its  implications  for  future  devices  is  not 
completely  known. 

Oxide  thickness  will  continue  to  be  scaled  in  future  devices  because  of  the  need  to 
improve  and  optimize  circuit  performance  [43].  Groeseneken  [36]  shows  that  for  an 
oxide  thickness  ranging  from  4.1  nm  to  1.7  nm  there  are  several  orders  of  magnitude 
drop  in  time  to  breakdown.  He  concludes  this  may  be  a  “showstopper  for  the  further 
downscaling  of  oxide  thickness”. 

In  contrast  to  these  results,  Wu  [54]  predicts  that  the  dielectric  lifetime  using  the 
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E-model  is  almost  infinite  for  a  nitride/oxide  in  the  1.8  nm  range.  He  further  states  that 
the  existence  of  TDDB  for  ultrathin  oxides  under  constant-current  or  constant- voltage 
stress  is  arguable.  These  contrasting  viewpoints  show  there  is  still  a  need  to  better 
understand  TDDB  as  device  features  shrink. 


3.5  Impact  of  Technology  Nodes  on  Lifetime 

As  discussed  in  Chapter  1,  the  size  of  semiconductor  technology  nodes  has  decreased 
at  a  steady  rate  and  this  trend  is  expected  to  continue.  Substituting  in  the  device 
parameters  for  each  node  into  the  lifetime  equations  from  Section  3.4  allow  us  to  see 
how  device  reliability  changes  due  to  electromigration,  hot  carrier  and  TDDB 
mechanisms. 

The  ITRS  2001  Roadmap  [4]  provides  several  tables  of  IC  trends.  Using  data  from 
the  roadmap.  Figure  3.8  shows  how  the  MTTF  for  electromigration,  TDDB  and  hot 
carrier  effects  vary  with  changing  technology  nodes.  This  plot  is  for  high  performance 
nMOS  devices,  such  as  microprocessors  and  ASICs.  The  vertical  scale  is  a  MTTF  for 
each  of  the  failure  mechanisms  normalized  to  the  2001  node  which  is  defined  as  having 
a  MTTF  =  1  unit  for  each  mode.  Similar  plots  can  be  made  for  DRAM  and  low  power 
devices.  To  give  an  idea  of  how  years  correspond  to  technology  nodes,  Table  3.1  lists 
the  half-pitch  for  several  years. 

The  graph  in  Figure  3.8  has  some  limitations  which  have  to  be  taken  into  account 
when  interpreting  it.  A  major  limitation  is  that  it  does  not  allow  for  any  change  in 
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technology.  All  the  lifetime  models  have  technology  constants.  Some  of  the  change  in 
the  constants  are  a  result  of  a  change  in  manufacturing  process  or  materials.  Other 
changes  in  the  technology  constants  result  from  intentional  changes  made  to  improve 
device  reliability.  As  as  an  example  of  changing  technology  constants,  consider 
electromigration.  Figure  3.8  shows  electromigration  MTTF  decreasing  sharply.  In 
reality  electromigration  reliability  hasn’t  shown  this  behavior.  The  reason  is  that 
technology  has  changed.  To  combat  the  risk  of  electromigration,  semiconductor 
manufacturers  have  made  changes  in  the  interconnect  alloys  used  and  in  IC  design, 
such  as  using  multiple  levels  of  metallization  to  reduce  current  densities.  Additionally, 
every  semiconductor  device  will  inherently  have  different  technology  constants  due  to 
different  design  features  and  manufacturing  processes. 

The  graph  was  built  using  the  models  presented  in  Section  3.4.  Starting  with 
electromigration,  in  Black’s  equation  (Eq.  3.9),  the  paramater  directly  dependent  on 
scaling  is  the  current  density,  ( je ).  The  ITRS  data  provided  information  on  feature  sizes 
and  Vdd,  but  not  je.  Using  the  feature  size  data  to  derive  the  scaling  factor  ( K ),  the 
change  in  interconnect  cross-sectional  area  is  A/ine  =  A[ineQ  1/ K2.  The  current  density 
change  is  je  Vdd/K2.  By  substituting  this  into  Black’s  equation, 

MTTFfM  (Xm/k2)-'7.  As  shown  in  Figure  3.8,  electromigration  increases  with 
scaling  as  the  current  density  increases  due  to  decreasing  interconnect  cross  sectional 
area.  This  plot  used  a  value  of  n  =  2. 

Hot  carrier  lifetime  was  determined  using  Eq.  3.19.  Using  this  equation, 
B=A(A)/W  where  W  1  / k.  The  value  of  i drain  1S  proportional  to  the  device  current 
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given  by  ITRS.  This  resultes  in  MTTF#c  =  i/ K.  For  TDDB,  the  ITRS  data  directly 
defined  the  electric  field  across  the  oxide.  Using  this,  MTTF tddb  ^  c'" . 

As  shown  in  Figure  3.8  and  assuming  unchanging  technology,  scaling  tends  to 
increase  the  failure  rates  for  electromigration  and  hot  carrier  effects.  TDDB  shows  an 
inflection  around  2000  because  the  original  ITRS  data  shows  a  corresponding  peak  in 
field  strength  across  the  dielectric  at  this  point. 

3.6  Summary 

This  chapter  provides  the  background  information  necessary  to  understand  the 
potential  of  device  wearout  and  the  impact  of  continued  device  scaling.  The  aerospace 
industry  is  concerned  about  the  possibility  of  reduced  semiconductor  device  lifetime 
from  three  major  failure  mechanisms,  electromigration,  hot  carrier  effects  and  time 
dependent  oxide  breakdown.  A  review  of  the  literature  shows  that  all  three  areas  are 
valid  areas  of  concern. 

Semiconductor  manufactures  are  aware  of  the  problems  with  scaling  and  are 
working  to  introduce  new  materials  and  processes  into  their  products.  This  is  referred 
to  as  “equivalent”  scaling  since  tradition  materials  are  reaching  their  scaling  limits  and 
new  materials  provide  a  means  to  continue  to  scaling. 

The  semiconductor  industry  remains  competitive  and  manufacturers  will  continue 
to  strive  for  more  performance  and  a  price/performance  ratio.  More  transistors  and 
meters  of  interconnects  within  a  IC  means  that  the  reliability  per  gate  and  unit  length  of 
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metallization  must  increase  as  the  shrinking  feature  sizes  pushes  their  reliability 
downward. 

These  trends  will  push  manufactures  to  what  Dellin  [55]  calls  “just  enough  IC”. 
Technology  will  be  optimized  to  just  meet  the  needs  of  major  commercial  customers 
meaning  that  unnecessary  reliability  margins  will  be  eliminated. 

While  future  devices  will  likely  meet  a  typical  commercial  customer’s  reliability 
expectations,  the  reduction  or  elimination  of  reliability  margins  will  have  an  impact  of 
some  non-typical  customers.  The  military/aerospace  market  has  effectively  relied  on 
that  margin  to  provide  the  necessary  reliability  in  COTS  devices  for  demanding 
aerospace  applications.  Without  it,  their  reliability  needs  may  not  be  met. 

As  future  products  are  developed  and  sold,  the  aerospace  industry  will  have  to 
remain  abreast  of  the  reliability  trends.  It  also  has  to  develop  alternative  courses  of 
action  to  mitigate  the  impact  of  any  loss  in  reliability,  reduction  in  lifetime,  or  increase 
in  wearout  in  these  future  devices. 
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Chapter  4 

Mitigating  the  Impact  of  Decreasing  Device 
Reliability  in  Aerospace  Applications 


The  focus  at  the  start  of  this  research,  given  the  concern  of  accelerated  device  wearout, 
was  to  develop  design  guidlines  to  mitigate  the  impact  of  wearout.  But  as  discussed  in 
chapters  1  and  3,  the  source  of  decreasing  reliability  is  a  market  as  well  as  technology 
driven  problem.  With  a  market  driven  problem,  there  cannot  be  a  solely  technological 
solution.  The  military  /aerospace  industry  is  a  very  small  market  for  device 
manufactures.  As  such  they  are  unlikely  to  devote  time  and  money  to  ensure  their 
products  meet  aerospace  needs.  There  is  no  profit  in  that  course  of  action.  The 
aerospace  industry  must  work  within  the  constraint  of  having  to  use  whatever  devices 
the  semiconductor  industry  supplies  for  their  leading  markets. 

Using  COTS  devices  in  aerospace  applications  implies  compromise.  Aerospace 
companies  gain  the  technological  increases  seen  in  the  commercial  semiconductor 
market  and  realize  a  reduced  cost  from  the  use  of  mass  produced  devices.  Drawbacks 
include  not  having  devices  designed  to  their  specific  needs.  After  the  Perry  memo, 
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COTS  devices  were  shown  to  have  acceptable  reliability  margins  for  most  aerospace 
applications  [1].  With  market  and  technology  pressures  reducing  reliability  margins, 
future  device  lifetime  may  be  inadequate.  What  this  means  is  the  aerospace  industry 
will  have  to  adapt  the  available  COTS  devices  to  their  needs  or  adapt  their  practices  to 
the  capabilities  of  the  available  devices.  One  way  to  accomplish  this  is  to  extend  the 
lifetime  of  COTS  devices.  This  involves  reducing  the  stress  that  causes  the  activation 
of  each  of  the  potential  failure  mechanisms.  Derating  a  component  from  its  rated 
operating  parameters  is  one  method  to  reduce  stress.  This  is  a  favored  method  of 
extending  the  lifetime  of  many  types  of  mechanical  and  electronic  components, 
however  there  is  little  discussion  in  literature  of  derating  ICs  to  extend  their  lifetime. 
How  this  may  be  accomplished  is  the  subject  of  the  second  iteration  of  my  research. 
The  results  are  presented  in  this  chapter. 

4. 1  Lifetime  Models — Constant  Failure  Rate  Justification 

To  begin  to  understand  how  to  increase  lifetime,  it  is  first  necessary  to  understand  the 
reliability  behavior  of  semiconductor  devices.  Much  of  this  work  was  comlpeted  in  the 
last  chapter,  but  derating  requires  more  information.  Among  these  is  a  better 
understanding  of  the  distribution  models  that  may  be  used  to  represent  intrinsic  IC 
failures.  This  was  accomplished  in  two  ways,  first  by  examining  field  failure  data  from 
avionics  systems  and  secondly  through  a  theoretical  argument.  These  two  approaches 
demonstrate  it  is  appropriate  to  use  a  constant  failure  rate  (exponential  distribution) 
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model  to  approximate  the  lifetime  of  semiconductor  devices  for  the  purpose  of 
predicting  their  lifetime  improvement  from  derating. 

The  use  of  the  exponential  distribution  to  model  the  reliability  of  electronic 
components  has  long  been  a  source  of  contention.  Historically  the  exponential 
distribution  has  been  used,  and  over  used,  to  model  electronic  reliability.  The  failure 
rate  during  the  ‘useful’  life  portion  of  many  electronic  systems  and  components  is  often 
constant.  Not  only  does  the  exponential  distribution  model  this  behavior,  the  model 
itself  is  very  easy  to  work  with  and  manipulate  mathematically.  This  ease  of  use  is  so 
great  that  it  results  in  a  constant  rate  assumption  being  made  even  when  it  is 
inappropriate  and  experimental  data  doesn’t  support  its  use. 

4.1.1  Empirical  Evidence 

Analysis  of  empirical  field  data  provides  a  glimpse  of  the  failure  behavior  of  in  service 
avionics  systems.  Examination  of  existing  systems,  while  not  a  direct  predictor  of  the 
behavior  of  new  systems,  does  provide  insight  into  how  they  will  behave.  This  section 
looks  at  two  sources  of  empirical  evidence.  The  first  is  a  study  of  field  failure  data 
provided  by  members  of  the  AVSI  Project  #17  team.  The  second  set  of  empirical 
evidence  comes  from  an  older  study  by  United  Airlines  on  component  hazard  rate 
shapes. 
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4. 1 . 1 . 1  Analysis  of  Avionics  Failure  Data 


Two  members  of  the  AVSI  Project  #17  team  supplied  in-service  field  failure  data.  The 
data  included  the  retum-for-service  records  for  eight  different  systems.  Data  from  each 
record  included  serial  number,  date  sold,  date  it  was  returned  for  service,  replaced  IC 
types  and  quantities. 

The  principle  investigator  for  this  analysis  was  Jin  Qin  [56].  As  is  typical  with  field 
data,  some  of  the  original  data  was  incomplete  or  invalid.  Qin  reviewed  the  original 
data  and  discarded  those  records  which  didn’t  contain  sufficient  information  to 
determine  hours.  After  review,  the  database  contained  records  for  18,176  systems  sold 
between  17  August  1982  and  30  December  2001.  Table  4.1  lists  the  size  of  the  sample 
population  by  service  year  and  number  of  failures  for  systems  labeled  A-H.  Qin  built 
the  table  using  the  following  assumptions: 

1.  Systems  were  grouped  by  the  year  they  entered  service. 

2.  Records  without  enough  information  to  determine  service  life  were  eliminated. 

3.  Only  the  time  to  first  failure  was  calculated. 

4.  Censoring  time  of  30  April  2002. 

The  statistic  analysis  of  the  data  was  accomplished  in  four  steps.  The  first  step  was 
probability  plotting  of  the  data.  Qin  analyzed  the  data  using  a  Weibull  distribution — he 
selected  the  Weibull  because  of  its  wide  use  in  the  electronics  industry  to  represent 
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electronic  component  failure  rates.  Using  a  goodness-of-fit  test,  Qin  confirmed  the 


Weibull  was  indeed  an  appropriate  lifetime  model  for  the  field  data. 

The  second  step  was  to  estimate  the  Weibull  parameters  using  the  maximum 
likelihood  estimation  (MLE)  technique.  The  critical  result  from  this  analysis  was  that 
the  /3  parameter,  the  shape  parameter,  was  close  to  one  for  each  data  subset. 

This  led  to  the  next  step,  verification  of  the  hypothesis  that  the  data  fit  an 
exponential  distribution1.  Qin  conducted  a  likelihood  ratio  test,  with  a  significance 
level  of  0.05,  to  confirm  the  hypothesis.  The  results,  shown  in  Table  4.2,  showed  the 
exponential  distribution  fit  most  of  the  data  subsets. 


Table  4.2:  Field  Data  Hypothesis  Test.  A:  accept  the  hypothesis  and  R:  reject  the 
hypothesis  [56]. 
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The  last  part  of  the  analysis  involved  examining  trends  in  the  system  failure  rates. 
As  shown  in  Figure  4.1,  systems  D,  E,  F  &  G  show  an  increasing  trend  after  1994. 
System  A  shows  this  trend  as  well  after  1998,  but  system  H  demonstrated  a  decreasing 

'The  exponential  is  a  special  case  of  the  Weibull  distribution  where  the  shape  parameter  C/3 )  equals 
one. 
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failure  rate  trend.  The  assumption  in  this  analysis  is  that  the  item  with  a  latter  service 


entry  used  newer  devices  with  smaller  geometries. 
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Figure  4.1:  System  Failure  Rate  Trends  [56]. 


The  conclusions  drawn  from  this  analysis  were  the  constant  failure  rate  model,  or 
exponential  distribution,  is  an  appropriate  model  to  use  for  inservice  avionic 
semiconductor  devices.  The  failure  rate  showed  an  increasing  trend  after  1994 
confirming  that  the  fears  involved  with  shrinking  device  features  are  reasonable. 

4. 1.1.2  “Bathtub  Curve  Fallacy” 

In  the  late  1960’s,  United  Airlines  examined  the  age  related  reliability  patterns  of  the 
non- structural  components  of  its  aircraft  fleet  [57].  Using  field  failure  rate  data,  they 
derived  the  hazard  rate  for  different  subsystems  as  a  function  of  time.  They  found  items 
that  fit  all  types  of  lifetime  failure  rate  curves  including: 
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Bathtub  curve. 


•  A  constant,  or  slowly  increasing,  hazard  rate  followed  by  a  pronounced  wearout 
region. 

•  A  slowly  increasing  failure  rate  with  no  identifiable  wearout  region. 

•  A  low  initial  failure  probability  followed  by  a  quick  rise  to  a  constant  hazard  rate. 

•  A  constant  hazard  rate. 

•  Infant  mortality  followed  by  a  constant,  or  slowly  increasing,  hazard  rate. 

The  study  had  several  suprising  conclusions.  First,  only  six  percent  of  aircraft 
components  experience  aging  and  wearout.  This  means  most  components  did  not 
follow  the  representational  bathtub  curve.  A  majority  of  components  (68%), 
particularly  electronic  components,  demonstrated  a  period  of  infant  mortality  followed 
by  a  constant,  or  slowly  increasing,  failure  rate. 

By  itself,  this  study  doesn’t  validate  the  use  of  the  exponential  model.  The  greatest 
shortcoming  is  the  technology  used  in  this  study  is  significantly  different  from  that 
found  in  today’s,  let  alone  future,  aircraft.  But  the  results  do  add  support  to  the 
justification  of  a  constant  failure  rate  for  avionics  since  they  were  conducted  on 
electronic  components  operating  in  a  similar  environment. 
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4.1.2  Why  a  Constant  Failure  Rate  Model  is  Justified 

Empirical  evidence  provides  a  good  indication  that  a  constant  rate  model  may  be 
appropriate  for  the  derating  model,  but  this  evidence  is  not  conclusive.  There  are  still 
many  arguments  against  the  exponential  distribution.  On  the  surface,  one  would  think 
the  exponential  is  not  an  appropriate  distribution  since  it  doesn’t  model  either  the  infant 
mortality  nor  wearout  phases.  Intellectually,  it  does  not  seem  to  make  sense  to  assume 
a  constant  failure  rate  since  this  doesn’t  account  for  aging.  However,  with  a 
combination  of  failure  mechanisms,  each  having  a  unique  failure  rate  and  distribution 
with  a  low  rate  of  occurrence  of  those  failures,  it  has  become  difficult  to  distinguise  an 
early  intrinsic  wearout  failure  in  a  device  from  a  random  failure.  The  nature  of  random 
failures  are  they  arrive  at  a  constant  rate. 

4. 1 .2. 1  Purpose,  Scope  and  Assumptions 

The  first  step  in  justifying  the  constant  rate  model  is  defining  the  purpose  of  model.  All 
models  are  nothing  more  than  approximate  representations  of  reality.  The 
appropriateness  and  usefulness  of  a  model  depends  on  its  intended  purpose  and  the 
available  inputs  to  the  model.  Derived  from  the  primary  purpose  of  accommodating 
shorter  lifetime  devices  in  longlife  applications,  the  purpose  of  the  lifetime  models  for 
electromigration,  hot  carrier  effects  and  TDDB  in  this  report  are  to: 

Model  the  change  in  semiconductor  device  lifetime,  for  a  device 
operated  a  derated  conditions,  from  electromigration,  hot  carrier  and 
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TDDB  failure  mechanisms.  The  models  shall  be  usable  by  AVS I  Project 


#17  members,  with  data  available  to  them  (either  from  the  device 
manufactures  or  via  accelerated  life  testing)  for  the  purpose  of  estimating 
lifetime  improvement  from  device  derating. 

The  scope  of  the  model  and  necessary  assumptions  are  implied  by  the  models  purpose. 
The  scope  of  the  derating  model  is  to  make  lifetime  prediction  and  the  lifetime 
improvement  estimates  resulting  from  operating  a  semiconductor  device  at  derated 
conditions. 

The  first  assumption  is  it  is  not  necessary,  nor  pertinent  to  this  research,  to  model 
the  failure  rate  over  the  entire  lifespace  of  the  devices.  The  focus  of  the  derating  model 
is  on  mean  lifetime. 

The  second  assumption  is  that  it  does  not  require  a  model  with  a  high  degree  of 
accuracy.  High  accuracy  would  require  extensive  knowledge  of  individual  IC  designs, 
information  that  is  proprietary  to  the  semiconductor  manufacurers  and  information  they 
are  unlikely  to  share  with  their  aerospace  customers.  It  would  also  require  large  and 
complex  models  which  would  defeat  the  purpose  of  a  model  that  could  be  applied  by 
the  aerospace  industry  to  estimate  device  lifetime  after  derating. 

4. 1 .2.2  Failure  Mechanism  Lifetime  Models 

With  the  model  requirements,  scope  and  assumptions  defined,  the  next  step  is  to 
examine  the  individual  failure  mechanism  lifetime  models.  Each  of  the  three  intrinsic 
failure  mechanisms,  electromigration,  hot  carrier  effects  and  TDDB,  has  different 
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failure  rate  distribution  as  previously  discussed  in  Section  3.4.  Electromigration  is 
generally  accepted  to  follow  a  lognormal  failure  distribution,  TDDB  a  Weibull 
distribution.  Published  literature  on  hot  carrier  effects  had  little  discussion  about  an 
appropriate  failure  distribution  for  that  mechanism,  though  the  published  lifetime 
models  assumed  the  use  of  exponential  [22]. 

Estimating  TDDB  lifetime  using  the  a  constant  failure  rate  model  is  the  easiest  to 
justify.  Several  researchers  have  shown  the  Weibull  shape  parameter  decreasing  as  the 
gate  oxide  thickness  decreases  [46,  51,  52].  Future  semiconductor  devices  will  have 
small  dimensions  with  thinner  oxides  making  the  exponential  model  (a  Weibull  with 
/3  =  1)  an  appropriate  lifetime  model. 

Hot  carrier  effects  don’t  have  an  accepted  lifetime  distribution  model.  Given  a  lack 
of  evidence  supporting,  or  suggesting,  any  another  model,  the  exponential  is  a 
reasonable  assumption.  I  formal  justification  for  this  approach  is  the  law  of  large 
numbers  and  complex  systems  demonstrating  constant  failure  rates  (see  Sec.  4. 1.2. 3). 

Electromigration  is  the  hardest  model  to  justify  using  a  constraint  model  for  since  it 
is  comonly  accepted  to  follow  the  lognormal  distribution.  Electromigration  is  definitely 
an  age  related  wearout  mechanism  with  its  most  commonly  accepted  lifetime 
distribution  being  lognormal.  To  justify  its  use  look  back  into  the  requirement  to  model 
the  mean  device  lifetime.  There  is  no  need  for  the  derating  model  to  model  the  entire 
lifespan  of  the  system.  As  device  manufactures  define  wearout  as  a  percentage  of  units 
failed,  it  is  the  failure  during  the  useful  life  portion  of  the  system  that  is  of  intrest. 

The  lognormal  model  has  been  the  accepted  model  for  electromigration  lifetime 
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simply  because  it  fits  the  data  well  [58].  Conceptually,  the  lognormal  distribution  has 
problems  since  it  cannot  be  scaled  with  changing  line  length.  Some  authors,  such  as 
Lloyd  [58]  argue  the  Weibull  distribution  is  a  better  model  since  electromigration 
failure  is  a  ‘weakest  link’  problem.  Electromigration  is  considered  a  weakest  link 
problem  since  a  device’s  metallization  is  made  up  of  a  series  of  interconnecting  links 
joining  device  features.  Each  link  has  a  random  inherent  strength  and  the  first  to  break 
results  in  failure  of  the  chain.  As  the  total  length  and  quantity  of  interconnects  increase 
there  are  more  links  in  the  chain  which  may  break,  increasing  the  chance  of  failure. 

The  form  of  the  cumulative  distribution  function  (CDF)  for  a  weakest  link  model  is 

Fn(s)  =  1  —  (1  —Fi(s))n  (4.1) 

where  F\  (5)  is  the  CDF  for  the  strength  of  an  individual  link  and  N  is  the  number  of 
individual  links.  The  Weibull  distribution  models  this  situation  opening  the  potential 
for  a  constant  rate  assumption  if  the  /3  is  low  enough. 

Other  authors,  such  as  Gall  [34],  have  argued  against  using  the  Weibull.  However, 
Gall’s  argument  was  that  he  found  the  Weibull  slope  to  be  decreasing  as  the  number  of 
possible  failure  links  in  a  device  increases.  This  argument,  while  increasing  doubt  as  to 
the  appropriateness  of  the  Weibull,  does  advance  the  argument  for  justifying  the  use  of 
the  exponential.  The  number  of  transistors  in  future  devices  is  expected  to  continue 
following  Moore’s  Law,  doubling  every  eighteen  to  twenty-four  months.  This  means 
the  number  of  links  subject  to  electromigration  doubles,  pushing  /3  lower,  and  making 
the  exponential  distribution  an  increasingly  better  approximation  of  electromigration 
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lifetime. 


4. 1 .2.3  Complex  System  Lifetime  Models 

Semiconductor  devices  are  highly  complex  systems  with  millions  of  individual 
components  (As  an  example,  the  Intel  Pentium  IV  processor  has  55  million  transistors 
[3]).  Complex  systems  tend  to  exhibit  constant  failure  rates  [26].  Drenick  explained 
this  behavior  within  his  limit  theorem.  He  stated  the  reliability  of  a  system  approaches 
the  limit  given  by  the  survival  function  5(f)  =  e  'A‘  as  the  system  becomes  increasingly 
complex.  Abernathy  adds  further  evidence  when  he  stated  that  as  the  number  of  failure 
modes  in  a  system  increases  to  five  or  more,  the  Weibull  shape  parameter  (/3)  will  tend 
toward  one  unless  all  the  modes  have  the  same  /3  and  similar  characteristic  life. 

An  example  of  how  increasing  complexity  results  in  a  constant  failure  rate  is  Gall’s 
observation  of  the  decease  in  Weibull  slope  as  the  number  of  possible  electromigration 
failure  links  in  a  device  increases.  Each  of  those  links  has  a  strength  associated  with  it. 
That  strength  will  vary  with  some  distribution  based  on  variables  from  design  and 
manufacture.  The  stress  each  link  will  see  is  also  a  random  variable,  again  based  on  the 
device’s  design  and  manufacture.  It  is  possible  for  the  strength  distribution  for  the  links 
to  have  outliers  due  to  random  non-lethal  defects  introduced  during  device 
manufacture.  This  series  of  random  strengths,  and  the  possibility  of  some  lower 
strength  links,  and  stresses  produces  a  large  spread  in  the  probability  distribution 
function  (PDF)  for  failure  of  the  weakest  link.  With  enough  links  the  PDF  looks 
constant. 
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In  short,  complex  systems  fail  with  a  constant  failure  rate  because  of  the  law  of 
large  numbers.  In  a  weakest  link  system,  failure  can  occur  at  any  point  as  long  as  there 
is  a  random  distribution  associated  with  the  failure  time  of  each  individual  link.  Even  if 
the  failure  distribution  of  a  individual  link  has  a  small  variance,  with  a  sufficiently  large 
number  of  links  there  is  a  probability  of  a  link  failing  at  any  given  point.  With  an  very 
large  number  of  links  the  probability  of  failure  any  any  given  point  is  constant.  This 
argument  alone  justifies  the  use  of  a  constant  failure  for  semiconductor  devices  with 
tens  of  millions  of  device  features. 

4. 1 .2.4  Constant  Failure  Rate  Summary 

Combining  all  these  pieces  of  evidence  provides  a  strong  case  for  making  the  constant 
failure  rate  assumption.  First,  field  data  has  demonstated  a  good  fit  to  Weibull 
distributions  with  characteristic  curves  (/3)  of  about  one.  An  examination  of  the 
individual  failure  mechanisms  shows  them  to  have  a  lower  Weibull  slopes  as  feature 
sizes  decrease  and  device  complexity  increases.  This  agrees  with  the  notion  that 
systems  will  inherently  demonstrate  a  more  constant  failure  rate  as  complexity 
increases.  With  more  complex  semiconductor  devices  it  will  be  difficult  to  distinguish 
any  inherent  wearout  failures  from  a  random  failure. 
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4.2  Lifetime  Enhancement  Through  Derating 


As  shown  in  Section  3. 2. 1.1,  it  is  possible  to  alter  a  semiconductor  device’s  lifetime  by 
changing  its  operating  parameters.  Two  parameters  a  designer,  using  a  semiconductor 
device,  may  control  are  junction  temperature,  because  of  heat  activated  mechanisms, 
and  supply  voltage.  A  semiconductor  device’s  operating  voltage  (V^)  directly  affects 
many  of  its  parameters.  These  include  current  density  (  je)  and  the  electric  field  (Sox) 
across  the  gate  dielectric.  Supply  voltage  also  has  a  significant  effect  on  junction 
temperature  (7)). 

Junction  temperature  (Tj)  is  the  internal  operating  temperature  of  a  device.  It  is 
dependent  on  the  power  dissipated  from  the  device  (Pd),  the  ambient  operating 
temperature  (Ta)  and  the  sum  of  the  thermal  impedances  between  the  die  and  ambient 
environment  (0  ja).  An  engineer  can  exercise  some  control  over  each  of  these  factors  in 
a  system  design. 

The  relationship  for  determining  the  junction  temperature  is  [59] 

Tj  =  djaPD  +  Ta  (4.2) 

with  power  dissipated  determined  by  [60] 

PD  =  KCVidf  +  i,Vdd  (4.3) 

where  Vdd  is  the  supply  voltage,  /  is  the  switching  frequency,  K  is  the  switching  factor 
and  C  is  the  average  node  capacitance.  The  power  dissipated  is  the  sum  of  both  static 
and  dynamic  power  dissipation.  In  CMOS  circuits,  dynamic  power  is  the  dominate 
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factor,  accounting  for  at  least  90%  of  the  power  dissipation  [61].  Thus  a  first  order 
approximation  of  power  disspation  is 

Pd  ~  Pdynamic  =  Qff^dd/  (4-4) 


where  Ceff  combines  the  physical  capacitance  and  activity  (number  of  active  nodes)  to 
account  for  the  average  capacitance  charged  during  each  1  //  period. 

While,  this  shows  that  Vdd  has  a  direct  impact  on  junction  temperature,  V%/  has  a 
further  impact  in  that  frequency  is  proportional  to  it  as  well.  In  a  CMOS  circuit,  a 
reduction  in  Vdd  results  in  a  near  linear  reduction  in  circuit  delay.  This  is  represented 
by  [62] 

f=(Vdd~Vth)2  fmaxVdd 

’max  ^  (4.5' 

V dd  {Ydd,max  H/?)~ 

Vth  is  the  threshold  voltage  and  fnax  and  Vdc^max  are  the  maximum  operating  frequency 
and  voltage  respectively. 

To  determine  junction  temperature  in  relation  to  the  source  voltage,  Eqs.  4.4  and 


4.5  are  substituted  into  Eq.  4.2, 


rp,  r'  a  t/2  (Ydd  V th fmaxVdd, r 

•  j  —  Eeff  q/flVdd- 


+  Ta 


(4.6) 


Vdd  {Vdd,max-Vth)2 

This  equation  is  simplified  by  estimating  the  combined  value  of  Ceff  and  0ja  after 
setting  Tj  and  Vdd  in  Eq.  4.6  to  TJjnax  and  Vddjnax  respectively  and  solving  for  Ceff0jfl, 


CeffO 


Pa  Tj,max 


ja 


f  V2 

J  max’  dd,  max 

Substitute  this  into  Eq.  4.6  results  in 

VddiYdd  —  Vth)2{Ta  ~  Tj 


(4.7) 


T  —T  — 

1J 


j,maxj 


Vdd  ,max  ( Vdd  ,max  -vthf 


(4.8) 
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Eq.  4.8  assumes  that  frequency  is  reduced  as  the  supply  voltage  is  decreased  as  shown 
in  Figure  4.2.  This  is  the  first  step  in  the  determination  of  lifetime  enhancement.  The 
next  step  is  to  relate  the  other  failure  mechanism  drivers  for  electromigration  and 
TDDB  to  Vdd. 


Figure  4.2:  Tj  given  Vdd  Scaling  Multiplier 


For  electromigration,  current  density  (je)  drives  wearout.  It  is  related  to  Vdd  by 


Je 


Vdd 

AxR 


(4.9) 


where  A  is  the  cross-sectional  area  of  the  interconnect  and  R  is  the  resistance.  For 


TDDB,  the  oxide  field  is  related  to  Vdd  via 


S’ox  oc 


te  ff 


where  teff  is  the  effective  thickness  of  the  oxide  layer. 


(4.10) 
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4.2.1  Derating  Factor 


While  the  use  of  the  term  Acceleration  Factor  is  appropriate  in  accelerated  life  testing, 
its  not  an  intuitive  term  to  use  when  discussing  derating  devices  in  order  enhance  their 
lifetime.  So  we  will  use  the  term  Derating  Factor  ( Df ).  This  is  equivalent  to  A  j.  but  is 
defined  as  the  ratio  of  measured  MTTF  of  a  semiconductor  at  its  manufacturer  rated 
operating  conditions  to  the  measured  MTTF  of  identical  devices  operating  at  derated 
conditions,  to  more  appropriately  reflect  its  use.  Symbolically  this  is, 


MTTF derated 
MTTFrated 


(4.11) 


Using  this  definition,  our  desired  values  for  Df  are  greater  than  zero  ( Df  >  0),  with 
larger  values  providing  a  longer  operational  life.  Thus  derated  lifetime  is 


MTTF  derated  =  Df  x  MTTFrated  (4.12) 


4.2.2  Modeling  Voltage  Derating 


The  effects  of  combined  voltage  and  frequency  derating  may  be  seen  by  substituting 
Eq.  4.8  into  the  respective  failure  time  models  for  each  of  the  wearout  failure 
mechanisms.  Using  the  equations  defined  earlier,  the  derating  factors  for 


electromigration,  hot  carrier  degradation  and  TDDB  are  respectively, 


D 


Iem 


Vdd,n 


Vdd 


exp 


-‘Gem 


T  T 

1 J  1  j, max 


1 


=  exTte 

D/tdDB  exP  ( y^ox  (  1  — 


1 


Vdd  Vdd  ,ma. 

Vdd 


Vdd,n 


exp 


F  (  1 

QTDDB  1 


T  T 

1]  1j,max 


(4.13) 

(4.14) 

(4.15) 
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For  electromigration  and  hot  carrier  degradation,  an  advantage  of  having  the 
derating  factor  in  terms  of  voltage  is  a  reduction  in  the  number  of  input  parameters 
required.  For  each  of  the  wearout  mechanisms,  the  threshold  voltage  (Vth)  and  rated 
operating  voltage  (VlhLrnax),  frequency  {fmax)  and  temperature  (Tjjnax)  are  needed.  For 
electromigration,  the  additional  parameters  required  are  activation  energy  ( EQem ))  and 
77.  The  current  density  (  je)  is  not  required.  For  HCD,  the  activation  energy  (EaHCD))  and 
N  are  similarly  required,  eliminating  the  need  for  substrate  current  (isub)-  The  derating 
factor  for  oxide  breakdown  unfortunately  does  not  reduce  the  number  of  parameters 
needed,  requiring  the  activation  energy  (Ea.rDDR))  ,y  and  the  oxide  field  (Sox). 

The  overall  derating  factor  ( D  f  )  involves  combining  the  derating  factors  for  each  of 
the  failure  mechanisms.  The  derivation  of  combining  these  derating  factors  begins  with 
the  hazard  rate.  A  system’s  hazard  rate  is  the  sum  of  the  individual  failure  mode’s 
hazard  rates  [20].  Assuming  exponential  distributions,  hi(t)  —  A;  =  mTtF7-  Given 

A  =  f>  (4.16) 

i=  1 

where  the  index  i  refers  to  each  of  the  n  failure  mechanisms  in  turn.  The  failure  rate  of 
a  derated  IC  is 


^ derated  ^-derated, i 

i=  1 
n  Q 

_  Aq 

L  ~D 

l—  1  1 


(4.17) 


The  derating  factor  is  determined  by  substituting  Eqs.  4.16  and  4.17  in  Eq.  4.11, 


Df  — 


Ln  E 

;=1  D, 


(4.18) 
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In  the  case  of  the  three  wearout  mechanisms  discussed  here 


Df=~x - t - r -  (4-19) 

J  A EM  _|_  AHCD  _|_  aTDDB 
DfEM  DfHCD  DfTDDB 

where  A  can  either  represent  the  total  failure  rate  or  the  sum  of  the  failure  rates  of  the 
wearout  mechanisms.  This  will  result  in  two  different  answers,  the  total  derating  factor 
and  wearout  derating  factor  respectively.  Figure  4.3  shows  the  derating  factor  achieved 
for  each  of  the  mechanisms  and  the  total  wearout  derating. 


0.8  0.9  1  1.1  1.2 

^voltage 


Figure  4.3:  Df  versus  Dvohage.  XEM  =  A tddb  =  A hcd ,  Tj.max  =  85°C,  Ta  = 
20°C,  V(jd,max  =  3.3  V,  Vtf,  =  0.8  V,  EclFM  =  0.8  eV,  n  —  2,  B  —  70,  EaTDDB  — 
0.75  eV,  <§ox  —  4  MV/cm,  y  =  3  Naperians  per  MV/cm  . 


To  highlight  the  effects  of  the  changing  temperature  and  frequency,  Figure  4.4 
shows  how  the  derating  factor  changes  if  only  voltage  alone  is  altered,  leaving 
temperature  and  frequency  constant.  This  demonstrates  reducing  the  operating  voltage 
alone  produces  some  lifetime  extension  benefits.  It  also  shows  how  a  system  designer 
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may  make  design  trade-offs  between  ambient  temperature,  cooling  and  operating 
voltage  to  achieve  differering  levels  of  device  performance  and  lifetime. 


0.8  0.9  1  1.1  1.2 

Dvoltage 


Figure  4.4:  Df  versus  Dvo[tage  with  constant  operating  temperature  and  frequency. 

=  h  i)DR  =  Xhcd,  Tj  =  85°C,  Ta  =  20°C,  Vdd,nax  =  3.3  V,  Vth  =  0.8  V,  E«em  = 
0.8  eV,  n  =  2,  B  —  70,  EaTDDB  —  0.75  eV,  Sox  —  4MV/cm,  y  =  3Naperians  per 
MV/cm  . 

Futhermore,  the  differences  between  figures  4.3  and  4.4  highlight  a  concern  about 
applying  thermal  acceleration  alone  in  accelerated  life  testing.  Because  of  the  low 
failure  rates  of  semiconductor  devices,  a  device’s  failure  rate  is  normally  determined 
through  accelerated  life  testing.  The  failure  rate  of  devices  at  accelerated  conditions  is 
determined  and  then  extrapolate  back  to  at-use  conditions,  using  an  acceleration  factor, 
in  order  to  approximate  an  MTTF.  So  when  accelerated  life  testing  is  used  to  determine 
the  rated  lifetime  of  a  device,  care  must  be  taken  to  ensure  that  all  the  relevant  failure 
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mechanisms  are  accelerated  in  order  to  make  a  reasonable  extrapolation  of  the  device’s 
failure  rate. 

4.2.3  Support  for  Derating 

While  dereating  of  electrical  components  to  increase  their  reliability  has  been  common 
practice  [63],  there  is  not  much  discussion  of  derating  semiconductor  devices  in 
literature.  Most  likely  this  is  because  ICs  have  demonstrated  acceptable  reliability  in 
most  applications  [1]  and  all  their  available  performance  is  invariably  always  used. 
However,  NASA  has  conducted  some  studies  on  derating  for  the  purpose  of  reduced 
power  consumption  [64] . 

NASA  conducted  a  study  to  identify  flightworthy  3.3  V  state-of-art  semiconductor 
devices.  The  study  focused  on  testing  COTS  5  V  CMOS  static  random  access 
memories,  from  three  suppliers,  to  see  if  they  would  function  at  3.3  V.  The  most 
successful  result  showed  one  supplier’s  device  would  function  over  a  voltage  range  of 

3.3  V  ±10%  and  from  -55  C  to  +125  C.  This  demonstates  it  is  possible  to  operate 
devices  at  a  lower  then  specified  operating  voltage. 

In  their  conclusion,  NASA  stated  many  existing  devices  have  the  potential  to  be 
operated  at  lower  than  nominally  specified  voltages.  While  NASA  conducted  this  study 
to  validate  reducing  the  operating  voltage  for  reduced  power  consumption,  this 
conclusion  implies  it  will  be  possible  to  reduce  operating  voltage  for  the  purpose  of 
increasing  lifetime  as  well. 
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4.3  Conclusions 


The  business  climate  is  forcing  the  military/aerospace  industry  to  make  increased  use 
of  COTS  semiconductor  devices.  With  the  strong  possibility  of  future  semiconductor 
devices  having  inadequate  lifetimes  for  longlife  aerospace  applications,  the  aerospace 
industry  requires  a  method  to  incorporate  those  devices  in  their  systems.  One  way  to 
accomplish  this  is  through  lifetime  enhancement  by  derating  the  operating  voltage  of 
the  devices.  While  derating  will  reduce  the  performance  of  a  device,  the  resulting 
reduction  of  operating  stresses  within  the  device  will  reduce  its  rate  of  failure  and 
extend  its  lifetime. 
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Chapter  5 


Summary 

5.1  Results 

This  research  was  conducted  in  support  of  AVSI  Project  #17,  Methods  to  Account  for 
Accelerated  Semiconductor  Device  Wearout.  The  purpose  of  this  project  is  understand 
the  impact  of  shrinking  device  features,  its  implications  on  device  lifetime  and  the 
potential  for  device  wearout.  This  project  grew  out  of  a  concern  in  the  aerospace 
industry  that  as  semiconductor  device  feature  sizes  continue  to  shrink,  the  reliability 
margins  in  new  devices  will  no  longer  be  sufficient  to  ensure  they  have  adequate 
lifetime  for  longlife  applications. 

Potentially  inadequate  lifetime  is  a  result  of  a  combination  of  business  and 
technological  factors.  The  technological  factors  are  derived  from  the  decreasing  size  of 
device  features.  Smaller  features,  along  with  an  increasing  number  of  transistors,  will 
reduce  device  reliability.  Customers  expect  device  manufacturers  to  continue  to 
produce  reliable  products,  even  with  smaller  features.  This  is  where  market  conditions 
effect  the  aerospace  industry.  Aerospace  companies  must  use  COTS  devices  in  their 
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systems.  As  a  very  small  portion  of  the  semiconductor  market,  the  aerospace  industry 
doesn’t  have  the  purchasing  clout  to  strongly  influence  device  requirements.  Thus 
when  balancing  reliability  versus  performance  parameters,  device  manufactures  will 
only  ensure  they  achieve  the  reliability  requirements  of  their  primary  customers,  the 
computer,  networking,  telecommunications  and  consumer  electronics  industries.  As  a 
result,  reliability  of  future  devices  is  not  guaranteed  to  meet  the  requirements  of 
challenging  aerospace  applications. 

Evidence  in  both  published  literature,  and  analysis  of  field  failure  data,  shows  there 
is  cause  to  be  concerned  with  increased  failure  rates  from  electromigration,  hot  carrier 
effects  and  TDDB  as  feature  sizes  decrease.  A  review  of  the  failure  mechanisms, 
lifetime  and  lifetime  distributions  for  these  mechanisms  show  the  potential  for 
increased  failure  rates.  Combined  with  the  business  climate  and  market  conditions,  the 
potential  for  decreased  device  lifetime  means  the  aerospace  industry  must  adapt  their 
systems  to  account  for  this  limitation. 

One  way  in  which  short  lifetime  devices  my  be  accommodated  is  by  extending 
their  lifetime.  This  may  be  accomplished  by  derating  the  devices,  operating  them  at  a 
lower  than  nominal  voltage.  This  reduced  operating  voltage  will  extend  lifetime 
through  two  mechanisms.  First,  reducing  voltage  has  a  direct  effect  on  the  lifetime  of 
electromigration  (dependent  on  Je  °c  Vjj).  hot  carrier  effects  (dependent  on  V^d)  and 
TDDB  (dependent  on  <§0x)-  In  addition  to  directly  reducing  lifetime,  lowering  the 
operating  voltage  decreases  the  operating  temperature.  This  increases  lifetime  via  the 
Arrhenius  relationship,  specifically  for  the  electromigration  and  TDDB  mechanisms. 
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The  drawback  to  voltage  derating  is  qualification  of  the  devices  for  use  and  the 
reduction  in  operating  frequency  leading  to  reduced  device  performance. 

5.2  Future  Work 

The  work  on  this  project  remains  ongoing  and  furtue  work  remains.  This  report 
presents  the  work  accomplished  to  date.  Three  of  the  project  Work 
Packages/Milestones  are  answered  in  this  report.  These  are: 

1.  Determine  Likely  Failure  Mechanisms  of  Future  Semiconductor  Devices  in 
Avionics  Applications. 

2.  Develop  Models  to  Estimate  Expected  Lifetimes  of  Future  Avionics. 

3.  Develop  Device  Assessment  Methods  and  Avionics  System  Design  Guidelines. 

The  remaining  Work  Packages/Milestones,  along  with  what  has  been  learned  so  far, 
serve  as  the  basis  for  future  work. 

5.2. 1  Verification  and  Validation  of  the  Derating  Model 

The  next  Work  Package/Milestone  for  the  ENRE  (Reliability  Engineering,  University 
of  Maryland)  to  complete  is  Verify  Models.  At  this  point  the  derating  model  is  still  only 
theoretical.  Experimental  work  has  to  be  accomplished  to  verify  and  validate  the 
model.  The  items  that  need  to  be  verified  are: 
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•  Verifiy  and  Validate  the  derating  model.  Confirm  the  derating  model  makes 
accurate  predictions  of  lifetime  improvement  for  a  given  amount  of  voltage 
decrease.  This  includes  verification  of  the  constant  failure  rate  assumption. 

•  Verify  derated  devices  remain  functional.  The  device  operating  characteristics 
and  performance  reduction  need  to  be  characterized. 

This  work  has  already  been  started  at  Maryland.  The  results  of  this  effort  will  support 
the  next  future  work  tasks. 

5.2.2  Derated  Device  Specification  Sheets 

This  task  will  be  the  responsability  of  the  AVSI  member  companies  support  by  ENRE. 
The  aerospace  industry  cannot  derate  semiconductor  devices  on  their  own.  For  both 
regulatory,  business  and  practical  reasons  they  are  required  to  use  devices  within  the 
data  sheet  specifications  provided  by  the  manufacturer.  Currently,  operating  a  device  at 
a  voltage  below  the  specified  operating  limits  is  not  supported  by  any  known  supplier 
[64], 

To  use  derated  COTS  semiconductor  devices  in  their  systems,  the  aerospace 
industry  must  have  the  supplier  characterize  each  derated  part  and  publish  a  technical 
specification.  The  results  of  research  so  far,  the  derating  model,  and  the  verification  of 
that  model,  provide  the  basis  for  showing  that  derating  is  a  valid  method  of  extending 
lifetime.  Using  this  information,  the  AVSI  member  companies  will  have  to  build  a 
business  case  for  the  semiconductor  suppliers  to  develop  specification  sheets  for 
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derated  devices.  This  work  may  be  part  of  the  Integrated  Aerospace  Parts  Acquisition 


Strategy  (see  Sec.  1.1.3). 


5.2.3  Alternative  S  y stem  Architecture 

There  is  no  guarantee  the  aerospace  industry  will  convince  semiconductor  suppliers  to 
develop  specifications  for  derated  compinents.  If  implementing  derated  parts  does  not 
occur,  aerospace  companies  will  require  an  alternative  method  to  mitigate  shorter 
device  lifetimes.  One  way  to  accomplish  this  is  to  investigate  alternative  avionics 
system  design  concepts.  Lloyd  Condra  proposed  the  idea  of  a  “Maintenance  Free” 
avionics  system  [65].  His  initial  concept  was: 


•  Develop  a  system  architecture  with  small,  modular,  throw-away  cards  and 
standard  ‘dumb’  back-planes. 

•  Consider  each  card  to  be  a  ‘system’. 

•  Systems  can  be  interoperable  (within  and  between  vehicles). 

•  Build  in  redundancy  and  fault  tolerance  at  the  card  (system)  level  to  facilitate 
scheduled  maintenance  and  eliminate  unscheduled  maintenance. 

•  Incentivize  system  provider  with  new  procurement  practices,  e.g.,  power-by-the 
hour,  etc. 

•  Upgrade  system  and  insert  new  technology  via  predicted  migration  paths  . 
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Ideas  such  as  this,  and  other  alternative  concepts,  need  to  examined  to  see  if  they 
would  produce  systems  that  mitigate  problem  of  shorter  lifetime  devices.  This  work 
would  be  a  much  more  traditional  development  process.  It  would  require  the 
development  of  requirements  for  the  alternative  systems,  new  system  design  concepts 
and  the  evaluation  of  those  ideas.  This  task  would  require  the  use  of  an  iterative 
systems  engineering  process  to  determine  and  refine  the  best  system  concepts. 

5.3  Conclusion 

In  this  dissertation  I  have  validated  the  idea  that  future  COTS  semiconductor  devices 
may  have  a  shorter  inherent  lifetime  then  today’s  devices.  The  result  maybe  ICs  having 
a  lifetime  inadequate  for  longlife  military /aerospace  applications.  Electromigration,  hot 
carrier  effects  and  TDDB  are  all  increased  with  shrinking  device  feature  sizes, 
increasing  the  potential  for  failure  caused  by  those  mechanisms. 

Due  to  market  conditions  and  business  climate,  the  aerospace  industry  must  use 
COTS  devices.  While  semiconductor  suppliers  are  expected  to  maintain  adequate 
device  reliability  for  their  core  customers,  the  aerospace  industry  doesn’t  have  the  clout 
to  have  their  requirements  seriously  considered.  The  aerospace  industry  must  adapt 
COTS  devices  to  their  needs  or  adapt  to  the  realities  of  future  COTS  device  lifetimes. 
Derating  an  IC  from  its  nominal  operating  voltage  is  a  way  to  extend  device  lifetime 
and  adapt  COTS  devices  to  the  needs  of  the  aerospace  industry. 
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